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J  I  \  Abstract  f  /  /  J  I 

Upper  and  lower  bounds  are  proved  for  the-'time^omplexity/of  tile  problem 
of  reaching  agreement  in  a  distributed,  net^drk,  iiytne  prysencef  of  pjrocess 
failures  and  inexact  information  about  time.  It  is  Assume/ thaythe  amount 
of  (real)  time  between  any  wo  cdnsecufive  stejis  of  any  nonfaulty process 
is  at  least  c,{  and  at  most  c^;  thus/C  =  ci/pfis  a  nfeasure  of  the  timing 
uncertainty.  It  is  also  assumed  that  the  timyfor  message  delivery  is  at  most 
d.  Processes  are  assumed  tofml  by  stopping,  so  tjtat  process  failures  can  be 
detected*  by  timeout's.  /  /  /  /  / 

A  straightforward  adaptation  of  an  (y +  l)yround/round-based  agreement 
algorithm  takds  tin «/(/+  1  )Cd  if  there  are  / faults,  while"a  straightforward 
reduction  from  a  timing-based  algorithm  to  a  roiaid-based  algorithm  yields  a 
lower  bounn  of  (//+ 1  )d.  The  first  major  resulyof  thi^paper  is  an  agreement 
algorithrr/in  which  the  uncertainty  factor  Cis-ifnly  incurred  for  one  round, 
yielding  a  running  time  of  approximately  2 Jd  +  Cd  in  the  worst  case.  The 
second  major  result  shows  that  any  agreement  algorithm  must  take  time  at 
least  (/  -  1  )d  +  Cd  in  the  worst  case.  / 

The  new  agreement  algorithm  can  /also  be  applied  in  a  model  where 
processors  are  synchronous  ( C  =  1),  and  where  message  delay  during  a 
particular  execution  of  the  algcrithny  is  bounded  above  by  a  quantity  Ji — <L  L  — 
which  could  be  smaller  than  the  wo^st-case  upper  bound  d.  The  running 
time  in  this  case  is  approximately  (2/  -  1)/  +  d.  rPSc 

Keywords:  distributed  agreement,  distributed  consensus,  agreement,  con¬ 
sensus,  timing  uncertainty,  fault-tolerance,  timeout. 
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1  Introduction 


Distributed  computing  theory  has  studied  the  complexity  requirements  of 
many  problems  in  synchronous  and  asynchronous  models  of  computation. 
There  is  an  important  middle  ground,  however,  between  the  synchronous 
and  asynchronous  extremes:  models  that  include  inexact  information  about 
timing  of  events.  This  middle  ground  is  reasonable  for  modeling  real  dis¬ 
tributed  systems,  in  which  the  amount  of  time  required  for  processes  to  take 
steps,  for  clocks  to  advance,  and  for  messages  to  be  delivered  are  generally 
only  approximately  known. 

We  are  interested  in  determining  the  complexity  of  problems  of  the  sort 
arising  in  distributed  computing  theory  in  models  with  inexact  timing  in¬ 
formation.  In  particular,  in  this  paper,  v/e  consider  the  time  complexity  of 
the  problem  of  fault-tolerant  distributed  agreement.  In  the  version  of  the 
agreement  problem  we  consider,  there  is  a  system  of  n  processes,  pi, . . .  ,pn , 
where  each  p;  is  given  an  input  value  V{.  Each  process  that  does  not  fail  must 
choose  a  decision  value  such  that  (i)  no  two  processes  decide  differently,  and 
(ii)  if  any  process  decides  v  then  v  was  the  input  value  of  some  process.  We 
assume  that  processes  fail  only  by  stopping.  This  abstract  problem  can  be 
used  to  model  a  variety  of  problems  in  distributed  computing,  e.g.,  agree¬ 
ment  on  the  value  of  a  sensor  in  a  real-time  computing  system,  or  agreement 
on  whether  to  commit  or  abort  a  transaction  in  a  database  system. 

The  time  complexity  of  the  distributed  agreement  problem  has  been  well 
studied  in  the  synchronous  “rounds”  model.  In  this  model,  the  computation 
proceeds  in  a  sequence  of  rounds  of  communication.  In  each  round,  each 
non-failed  process  sends  out  messages  to  all  processes,  receives  all  messages 
sent  to  it  at  that  round,  and  carries  out  some  local  computation.  (See, 
for  example,  [PSL80,  LSP82,  D82,  FL82,  DLM82,  LF82,  DS83,  H84,  M85, 
DM86,  MT88,  C86,  MW88,  BGP89]  for  results  involving  time  complexity  in 
this  model.)  The  most  basic  time  bound  results  in  these  papers  axe  matching 
upper  and  lower  bounds  of  /  +  1  on  the  number  of  synchronous  rounds  of 
communication  required  for  reaching  agreement  in  the  presence  of  at  most 
/  faults. 

We  consider  how  these  bounds  are  affected  by  using,  instead  of  the 
rounds  model,  one  in  which  there  is  inexact  timing  information.  In  par¬ 
ticular,  we  assume  that  the  amount  of  time  between  any  two  consecutive 
steps  of  any  nonfaulty  process  is  at  least  ci  and  at  most  C2,  where  ci  and 
C2  are  known  constants;  thus,  C  =  ci/c\  is  a  measure  of  the  timing  uncer- 
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tainty.  We  also  assume  that  the  time  for  message  delivery  is  at  most  d.1 
Since  processes  are  assumed  to  fail  only  by  stopping,  process  failures  can  be 
detected  by  “timeouts”;  that  is,  if  an  expected  message  from  some  process 
is  not  received  within  a  sufficiently  long  time,  then  that  process  is  known  to 
have  failed.  The  time  required  to  implement  a  timeout  is  roughly  Cd. 

Initially,  we  hoped  to  be  able  to  adapt  known  results  about  the  rounds 
model  to  obtain  good  bounds  for  the  version  with  inexact  timing.  Indeed, 
an  (/  +  l)-round  algorithm  can  be  adapted  in  a  straightforward  way  to 
yield  an  algorithm  for  the  timing-based  model  that  requires  time  at  most 
(/  + 1  )Cd  if  there  are  /  faults.  On  the  other  hand,  a  simple  transformation 
to  a  rounds  algorithm  yields  a  lower  bound  of  (/+ 1 )d.  There  is  a  significant 
gap  between  these  two  bounds,  namely,  a  multiplicative  factor  equal  to  the 
timing  uncertainty,  C.  The  motivation  for  our  work  is  to  obtain  closer 
bounds  on  the  time  complexity  of  this  problem,  in  particular,  to  understand 
how  this  complexity  depends  on  C. 

The  first  major  result  of  this  paper  is  an  agreement  algorithm  in  which 
the  uncertainty  factor  C  is  only  incurred  for  one  round,  yielding  a  run¬ 
ning  time  of  approximately  2 fd  +  Cd  in  the  worst  case.  This  algorithm 
uses  timing  information  in  a  novel  way  in  order  to  achieve  fast  time  perfor¬ 
mance.  An  interesting  feature  of  the  algorithm  is  that  it  can  be  viewed  as  an 
asynchronous  algorithm  that  uses  a  fault  detection  (specifically,  a  timeout) 
mechanism.  That  is,  the  timing  bounds  cj,  ci  and  d  are  used  only  in  the 
fault  detection  mechanism. 

The  second  major  result  shows  that  any  agreement  algorithm  must  take 
time  at  least  (/  —  1  )d+Cd  in  the  worst  case.  The  proof  of  this  lower  bound 
combines  ideas  used  in  the  rounds  model  ([FL82,  DLM82,  DS83,  LF82,  H84, 
M85,  CD86,  DM86]),  in  the  asynchronous  model  ([FLP85,  DDS87])  and  in 
timing-based  models  ([AL89]).  More  specifically,  it  uses  a  “chain  argument” 
such  as  those  used  previously  to  prove  that  /  +  1  rounds  are  required  in  the 
synchronous  model,  a  “bivalence  argument”  such  as  those  used  previously  to 
prove  that  fault- tolerant  agreement  s  impossible  in  an  asynchronous  system, 
and  a  “time  stretching”  argument  such  as  those  used  to  prove  lower  bounds 
for  resource  allocation  problems. 

Although  these  bounds  are  not  completely  tight,  they  do  demonstrate 
that  the  time  complexity  only  involves  the  “timeout  bound”  Cd  in  a  single 
additive  term;  Cd  is  not  multiplied  by  /  (the  total  number  of  potential 

Results  of  [FLP85,  DDS87]  imply  that  if  any  of  the  bounds  ci,C2,d  does  not  exist, 
then  there  is  no  agreement  algorithm  tolerant  to  even  one  fault. 
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failures)  as  it  is  in  the  naive  algorithm.  Note  that  this  new  bound  represents 
a  significant  improvement  over  the  naive  algorithm  in  case  C  is  large  (greater 
than  2),  as  might  happen  in  the  presence  of  inaccurate  processor  clocks  or 
variable-time  process  swapping. 

Our  algorithm  also  yields  upper  bound  results  for  a  related  model  used 
by  Herzberg  and  Kutten  [HK89]  to  study  fault  detection  in  host-to-host 
protocols.  In  their  model,  process  steps  are  completely  synchronous,  that 
is,  C  =  1,  and  there  is,  as  above,  an  upper  bound  d  on  the  worst-case 
time  for  any  message  to  be  delivered.  Even  though  algorithms  must  be 
designed  to  be  correct  in  case  that  any  message  delay  is  d,  in  reality  message 
delivery  could  be  much  faster  than  d  in  many  executions.  Therefore,  it  makes 
sense  to  express  the  time  complexity  of  an  algorithm  in  terms  of  a  new 
parameter  6 ,  the  actual  message  delay  during  execution  of  the  algorithm, 
as  well  as  in  terms  of  the  worst-case  bound  d.  Again,  a  straightforward 
adaptation  of  an  (/  +  l)-round  agreement  algorithm  gives  an  agreement 
algorithm  for  this  model  which  runs  in  time  (/  +  1  )d,  even  ir.  executions 
where  {  «  d.  In  contrast,  the  main  agreement  algorithm  of  this  paper 
runs  in  time  approximately  (2/  —  1)6  +  d.  That  is,  the  number  of  faults 
multiplies  the  actual  message  delay  6  rather  than  the  worst-case  delay  d. 
Our  lower  bound  techniques  can  be  modified  to  give  a  lower  bound  (of  time 
(2/  -  n)S  +  d,  if  n  <  2f)  for  this  model. 

There  has,  of  course,  been  a  considerable  amount  of  previous  work  on  the 
agreement  problem  in  various  models;  a  representative  selection  of  references 
to  this  work  appears  above.  However,  there  has  been  very  little  work  so  far 
on  this  problem  with  inexact  timing  information. 

Some  prior  work  on  distributed  agreement  in  a  model  with  inexact  timing 
information  appears  in  [DLS88].  The  main  emphasis  in  [DLS88]  was  on 
determining  the  maximum  fault  tolerance  possible  for  various  fault  models; 
only  rough  upper  bounds  on  the  time  complexity  of  the  algorithms  were 
given,  and  no  lower  bounds  on  time  were  proved.  In  contrast,  the  main 
emphasis  of  the  present  paper  is  on  time  complexity. 

Related  work  on  the  latency 2  of  reaching  agreement  when  processes  are 
not  completely  synchronous  appears  in  [CASD86]  and  [SDC90].  These  pa¬ 
pers  assume  that  process  clocks  are  synchronized  to  within  some  fixed  ad¬ 
ditive  error,  and  the  case  6  <  d  is  not  considered.  Unlike  the  results  in  our 
paper,  these  results  are  stated  in  terms  of  clock  time  rather  than  absolute 
real  time.  Although  it  is  possible  to  translate  results  from  those  papers  into 
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our  model,  doing  so  appears  to  yield  results  with  a  less  precise  dependency 
on  the  timing  uncertainty  than  we  obtain  here. 

This  work  is  part  of  an  emerging  study  of  the  real-time  behavior  of  dis¬ 
tributed  systems.  Other  work  in  this  area  includes  the  extensive  literature 
on  clock  synchronization  algorithms.  (See  [DHS86,  HMM85,  LM85,  LL84, 
WL88],  for  example.)  More  recently,  the  mutual  exclusion  problem  has  been 
studied  in  a  timing-based  model  with  C  >  1  [AL89].  Also,  the  time  com¬ 
plexity  for  a  synchronizer  algorithm  to  operate  in  a  timing-based  network  is 
studied  in  [AM90],  and  the  time  complexity  of  leader  election  algorithms  in 
a  timing-based  model  appears  in  [CT90]. 

The  rest  of  the  paper  is  organized  as  follows.  Section  2  contains  a  de¬ 
scription  of  the  formal  model  we  use  for  timing-based  distributed  systems 
and  a  statement  of  the  distributed  agreement  problem.  In  Section  3,  we 
describe  a  useful  “subroutine”  for  timing  out  failed  processes.  Section  4 
contains  a  discussion  of  some  simple  upper  bound  results  that  arise  easily 
from  the  known  results  for  the  rounds  model.  In  Section  5  we  give  our  main 
upper  bound  result.  Section  6  contains  our  lower  bound  result.  Section  7 
contains  our  results  for  the  model  with  synchronous  processes  and  uncertain 
message  delivery  time.  Finally,  Section  8  contains  our  conclusions. 

2  Definitions 

2.1  Formal  Model 

In  this  section,  we  present  the  definitions  for  the  underlying  formal  model.3 

An  algorithm  consists  of  n  processes  Pi,.. .  ,pn.  Each  process  p,  is  mod¬ 
eled  as  a  (possibly  infinite)  state  machine  with  state  set  Q,.  The  state  set 
Qi  contains  a  distinguished  initial  state  qoj  and  a  distinguished  fail  state. 

A  configuration  is  a  vector  C  =  (gi, . . .  ,qn)  where  g,-  is  the  local  state  of 
p»;  denote  statefC)  =  g,-.  The  initial  configurationis  the  vector  (go,i, ...  ,go,n) 
Processes  communicate  by  sending  messages {taken  from  some  alphabet  M) 
to  each  other.  A  send  action  send(j,m)  represents  the  sending  of  message 
m  to  pj.  Let  S  denote  the  set  of  all  send  actions  send(j,m)  for  all  m  £  M 
and  all  1  <  j  <  n.  Processes  can  receive  inputs  from  some  set  V  of  values. 

We  model  a  computation  of  the  algorithm  as  a  sequence  of  configurations 
alternated  with  events.  Each  event  is  either  a  computation  event,  represent- 

3The8e  definitions  could  be  expressed  in  terms  of  the  general  timed  automaton  model 
described  in  [MMT88]  and  [AL89];  however,  we  choose  here  to  present  the  definitions 
directly,  in  order  to  avoid  the  intervening  layer  of  definitions. 
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ing  a  computation  step  of  a  single  process,  a  failure  event ,  representing  the 
failure  of  some  process,  a  delivery  event ,  representing  the  delivery  of  a  mes¬ 
sage  to  a  process,  or  an  input  event ,  representing  the  arrival  of  a  value  at  a 
process. 

A  computation  event  is  specified  by  comp(if  S )  where  i  is  the  index  of 
the  process  taking  the  step  and  5  is  a  finite  subset  of  S.  In  the  computation 
step  associated  with  event  comp(i,S),  the  process  p,-,  based  on  its  local 
state,  performs  the  send  actions  in  54  and  possibly  changes  its  local  state. 
A  failure  event  has  the  form  fail(i,S)  and  causes  the  send  actions  in  S  to 
be  performed;  other  properties  of  failure  events  are  detailed  below.  Each 
delivery  event  has  the  form  del(i, m)  for  some  m  G  M,  and  each  input  event 
has  the  form  input  (i,v)  for  some  v  G  V.  In  these  events,  the  process  p,-, 
based  on  m  (or  v )  and  its  local  state,  possibly  changes  its  state. 

Each  process  p;  follows  a  deterministic  protocol  that  determines  its  state 
transitions  and  the  messages  it  sends.  In  more  detail,  the  protocol  consists 
of  two  transition  functions,  <pi  for  delivery  and  input  events,  and  7,-  for 
computation  events.  For  each  q  G  Qi  and  a  G  M  U  V,  ¥>«'(?>  °)  gives  a  state 
q'  G  Qi-  For  each  q  G  Qi ,  7 i(q)  gives  a  state  q'  and  a  finite  set  S  of  send 
actions.  We  assume  in  both  cases  that  q  =  fail  if  and  only  if  q'  =  fail,  and 
we  assume  that  S  is  empty  if  q  =  fail.  These  conditions  mean  intuitively 
that  (i)  the  protocol  cannot  cause  the  process  to  leave  the  fail  state,  (ii)  the 
protocol  cannot  cause  a  process  to  enter  the  fail  state  from  a  non- fail  state, 
and  (iii)  no  messages  are  sent  from  the  fail  state. 

An  execution  is  an  infinite  sequence  of  alternating  configurations  and 
events 

®  =  Co  >  TTl ,  Cl , .  • « ,  ICj,  Cj , . .  •  , 
satisfying  the  following  conditions: 

1.  Co  is  the  initial  configuration; 

2.  If  7 Tj  =  del(i,a )  or  input(i,a ),  then  statei(Cj)  is  obtained  by  applying 
< pi  to  statei(Cj-\)  and  a; 

3.  If  Wj  =  comp(i,S),  then  statei(Cj)  and  S  are  obtained  by  applying  7; 
to  statei(Cj- 1); 

4.  If  7 Tj  =  fail(i,S),  then  statei(Cj-i)  ^  fail,  statei(Cj )  =  fail,  and  S  is  a 
subset  of  the  send  events  obtained  by  applying  7 ,•  to  statei{Cj- 1); 

4In  all  our  algorithms  this  will  be  broadcast(m),  that  is,  {send(l,  m), . . . ,  send(n,  m)}. 
A  broadcast  includes  a  message  to  the  sender  itsell. 
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5.  If  7ry  involves  process  i,  then  state  k(Cj-i)  =  statek(Cj)  for  every  k  ^  i; 

6.  (Each  send  is  matched  to  a  later  delivery  and  each  delivery  to  an  earlier 

send.)  For  each  m  €  M  and  each  process  pi,  let  S(i,  m)  be  the  set  of  j 
such  that  7Tj  contains  a  send(i,  m)  and  let  D(i,  m)  be  the  set  of  j  such 
that  7 r,-  is  a  delivery  event  del{i,m).  Then  there  is  a  bijective  mapping 
ai>rn  from  S(i,m)  to  D(i,m)  such  that  >  j  f°r  all  j  €  S(i,m). 

A  timed  event  is  a  pair  (x,t),  where  7r  is  an  event  and  t,  the  “time”, 
is  a  nonnegative  real  number.  A  timed  sequence  is  an  infinite  sequence  of 
alternating  configurations  and  timed  events 

®  =  Co, (ttj, h),Cu « •  •  > i^j,tj),Cj, ...  , 

where  the  times  are  non  decreasing  and  unbounded. 

Fix  real  numbers  ci,  c2,  and  d,  where  0  <  Ci  <  c2  <  oo  and  0  <  d  <  oo. 
Letting  a  be  a  timed  sequence  as  above,  we  say  that  a  is  a  timed  execution 
provided  that  the  following  all  hold: 

1.  Co, 7Ti,  Ci, . . . , 7rf, Cj, ...  is  an  execution; 

2.  There  are  computation  or  failure  events  for  all  processes  with  time  0; 

3.  There  are  infinitely  many  computation  or  failure  events  for  each  pro¬ 
cess; 

4.  (Bounds  on  step  time)  Suppose  j  <  k,  the  jth  and  kth.  timed  events 
are  both  either  computation  or  failure  events  of  the  same  process  p,-, 
and  there  are  no  intervening  computation  or  failure  events  of  p,\  Then 
Ci  <  tk  ~  tj  5s  c2; 

5.  (Upper  bound  on  message  delivery  time)  If  message  m  is  sent  to  p,-  at 

the  jth  timed  event  then  there  exists  k  >  j  such  that  the  kth  timed 
event  is  the  matching  delivery  ( del(i,m),tk )  (i.e.,  =  k)  and 

tk  ~~  tj  ^  d» 

Note  that  for  any  timed  execution  a  and  rny  p,-,  there  is  at  most  one 
timed  event  of  the  form  (/ai/(i,  S ),  t).  If  there  is  such  an  event,  we  call  t  the 
failure  time  of  p,-. 

We  define  a  timed  execution  prefix  to  be  any  finite  prefix  of  a  timed 
execution  (ending  with  a  configuration).  For  any  timed  execution  prefix  a, 
we  define  teni(ct)  to  be  the  time  associated  with  the  last  event  in  a  (0  if  a 
contains  no  timed  events). 
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We  say  that  a  process  p;  receives  the  message  m  by  time  t  (in  a  timed 
execution  a)  if,  by  time  i,  p;  has  a  computation  or  failure  event  that  is 
preceded  in  a  by  a  delivery  event  del(i,m).  For  the  rest  of  the  paper  let  D 
denote  d  +  ca.  Note  that  if  m  is  sent  to  p;  at  time  t,  then  p,-  receives  m  by 
time  t  +  D.  Similarly,  we  say  that  a  process  p,-  receives  the  input  v  by  time 
t  if,  by  time  t,  p;  has  a  computation  or  failure  event  that  is  preceded  in  a 
by  an  input  event  input(i,v). 

For  any  timed  execution  a,  we  define  deiay(a )  to  be  the  maximum  delay 
of  any  message  delivery  in  a.  When  a  is  clear  from  context,  we  will  often 
use  the  notation  S  to  denote  delay{a ),  and  will  let  A  =  6  + 

To  simplify  the  expression  of  our  time  bounds  in  terms  of  the  parameters 
6,  d)  ci  and  C2,  we  sometimes  approximate  the  bounds  in  the  case  that 
C2  <  S.  For  example,  in  this  case  we  have  D  w  d  and  A  «  6. 

2.2  The  Agreement  Problem 

We  now  specify  the  agreement  problem.  The  original  definition  of  the  prob¬ 
lem  in  round-based  systems  (e.g.,  [LSP82])  assumes  that  all  processes  begin 
executing  simultaneously  with  their  initial  values  already  in  their  states. 
This  degree  of  initial  synchronization  is  not  very  realistic  in  a  distributed 
network.  Since  we  are  interested  in  capturing  timing  uncertainty,  we  have 
included  input  events  in  the  definitions  to  permit  asynchronous  starts  of  the 
protocol.  Let  V  be  a  set  of  values.  We  assume  that  each  set  Qi  of  local  states 
includes  a  subset  of  decision  states  for  each  v  €  V,  such  that  fail  is  not  a 
decision  state,  the  sets  of  decision  states  for  different  values  are  disjoint,  and 
the  transition  functions  <f{  and  7;  map  each  decision  state  for  v  to  a  decision 
state  for  v.  A  process  decides  on  v  by  changing  its  state  to  a  decision  state 
for  v  (so  its  state  thereafter  is  always  a  decision  state  for  v ). 

A  timed  execution  a  (or  timed  execution  prefix)  is  /■ -admissible  if  a 
contains  at  most  /  failure  events  and,  for  each  p,-,  exactly  one  input  event 
input(i,Vi).  For  each  p,-,  define  start, -(a)  to  be  the  smallest  time  f  such 
that  pi  receives  an  input  by  time  t.  Define  startla )  to  be  the  maximum  of 
starts  a)  over  all  i. 

Let  D  be  a  mapping  from  the  positive  reals  to  the  positive  reals.  An 
algorithm  solves  the  agreement  problem  for  f  faults  within  time  B  provided 
that  each  of  its  /-admissible  timed  executions  a  satisfies  the  following: 

1.  (Agreement)  No  two  different  processes  decide  on  different  values; 
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2.  (Validity)  If  some  process  decides  on  v,  then  an  event  input(i,  v)  occurs 
in  a;5 

3.  (Termination  and  Time  Bound)  Every  process  either  has  a  failure  event 
or  makes  a  decision  by  time  start(a)  +  B(delay(a)). 

We  finish  this  definition  section  with  a  statement  of  a  slightly  weaker 
version  of  the  agreement  problem.  This  may  be  interesting  because  our 
lower  bound  results  still  apply  for  the  weaker  problem  statement.  (Our  up¬ 
per  bound,  however,  satisfies  the  stronger  problem  statement  given  above.) 
Namely,  we  define  the  agreement  problem  with  synchronized  start  to  be  the 
same  as  the  agreement  problem,  except  that  the  three  properties  listed  above 
must  hold  only  for  /-admissible  timed  executions  a  in  which  each  process 
receives  its  initial  value  at  time  0;  formally,  for  each  process  p,-,  there  is 
a  timed  event  (input  (i,Vi),0)  in  a  which  precedes  every  computation  and 
failure  event  of  p,\  Our  default  convention  is  that  the  synchronized  start 
condition  does  not  hold. 

We  will  carry  out  the  main  development  using  a  Boolean  version  of  the 
problem,  i.e.,  V  =  {0,1}.  Later  we  will  discuss  extensions  to  the  case  of  an 
arbitrary  value  set. 

3  A  Timeout  Strategy 

In  the  algorithms  we  describe  below,  it  will  be  convenient  to  describe  each 
Pi  as  a  “parallel  composition”  of  two  tasks,  a  “timeout”  task  and  a  “main” 
task. 

The  basic  idea  of  the  timeout  task  is  very  simple.  At  each  step,  each 
process  broadcasts  an  alive  message.  If  some  process  p;  has  run  for  suffi¬ 
ciently  many  steps  without  receiving  an  alive  message  from  the  process  pj, 
then  pi  concludes  that  pj  halted. 

In  more  detail,  the  timeout  task  of  p;  has  the  following  state  components: 
blocked,  a  Boolean,  initially  true  (the  purpose  of  blocked  is  to  allow  the  main 
task  to  stop  the  timeout  task);  a  set  halted  C  {1, . . . ,  n},  initially  0;  for  each 
j  6  {l,...,n}  a  nonnegative  integer  counter(j),  initially  —1.  In  addition, 
the  local  state  of  each  process  contains  a  component  buff,  to  which  messages 
are  added  at  each  message  delivery  event.  Figure  1  describes  the  steps  of 

*Note  that  this  condition  is  slightly  stronger  than  the  usual  validity  condition  for 
distributed  agreement  problems. 
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Precondition: 

not  blocked ; 

Effect: 

broadcast((a/ire,i))  ; 
for  j  :=  1  to  n  do 

counter(j )  :=  counter(j )  +  1 ; 
if  (alive,  j)  6  buff  then 
remove  (alive,  j)  from  buff ; 
counter(j)  :=  0  ; 

elseif  c ounter(j)  >  \JDjc\\  +  1  then 
add  j  to  halted  ; 
od  ; 


Figure  1.  The  timeout  task. 


the  timeout  task  of  process  p,  that  are  associated  with  comp(i,  S)  events,  in 
precondition-effect  style.  Recall  that  D  =  d  +  ci. 

Assume  that  each  local  protocol  includes  the  transitions  indicated  in 
Figure  1.  Say  that  a  process  halts  at  time  t  if  it  either  fails  at  time  t  or  sets 
blocked  to  true  at  time  t.  We  assume  that  if  the  main  task  of  p;  sets  blocked 
to  true  at  some  step,  then  the  main  task  of  p;  sends  no  messages  at  later 
steps.  Fix  a  timed  execution  a;  we  prove  the  following  properties  for  a. 

Tl.  If  any  p,  adds  j  to  halted  at  time  t,  then  pj  halts,  and  every  message 
sent  from  pj  to  p;  is  delivered  strictly  before  time  t. 

T2.  There  is  a  constant  T  such  that,  if  pj  halts  at  time  t,  then  every  p; 
either  halts  or  adds  j  to  halted  by  time  t  +  T. 

To  verify  Tl,  let  p;  add  j  to  halted  at  time  t.  We  first  show  that  pj 
halts.  If  not,  then  pj  sends  an  alive  message  to  p,-  at  each  of  its  steps.  The 
maximum  difference  between  the  times  of  two  such  consecutive  send  events 
is  C2;  the  time  between  the  two  corresponding  delivery  events  is  maximized 
by  assuming  that  the  first  message  takes  time  0  and  the  second  takes  time 
d.  Thus,  this  difference  is  at  most  D.  However,  since  time  at  least  c\  elapses 
between  every  two  steps  of  pi,  time  at  least  C\\\D  jc\\  4- 1)  >  D  must  elapse 
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between  the  last  delivery  of  an  alive  message  from  pj  before  time  t  and  time 
t  (when  j  is  added  to  halted).  This  is  a  contradiction,  so  pj  halts. 

By  a  similar  argument,  we  show  that  every  message  from  pj  to  p,-  gets 
delivered  strictly  before  time  t.  Suppose  that  pj  sends  a  message  m  to  p; 
at  some  step.  Then,  at  pj’s  previous  step,  pj  sends  an  alive  message  m' 
to  pi.  As  before,  the  maximum  possible  difference  between  the  times  of  the 
deliveries  of  m!  and  of  m  is  at  most  D,  but  time  strictly  greater  than  D  must 
elapse  between  the  delivery  of  m!  and  time  t.  It  follows  that  m  is  delivered 
strictly  before  time  t. 

Now  let  6  =  delay(a ),  the  maximum  delay  of  any  message  delivery  in 
a,  and  recall  that  A  =  6  +  c2.  We  verify  T2,  with  a  timeout  bound  T  of 
approximately  Cd  +  6.  Suppose  pj  halts  at  time  t,  so  that  the  last  alive 
message  from  pj  to  p;  is  sent  no  later  than  time  t.  Therefore,  by  time 
t'  =  t  +  A,  pi  will  set  py’s  counter  to  zero  for  the  final  time.  So  by  time 
t'  +  c2(UVciJ  +  1)>  Pi  adds  j  to  halted.  Therefore,  our  algorithm  has  the 
timeout  bound 

r=A+“(l£H- 

In  case  c2  <  6,  we  have  T  «  Cd  +  6. 

In  our  algorithms  that  use  the  timeout  task,  we  use  only  the  fact  that 
the  timeout  task  has  properties  T1  and  T2,  and  we  express  the  time  bounds 
of  these  algorithms  in  terms  of  the  parameter  T.  Therefore,  given  a  way  to 
detect  process  failures  with  a  timeout  bound  T  smaller  than  the  one  given 
above,  this  detection  method  could  be  used  to  improve  the  time  bounds. 
We  do  assume,  however,  that  T  >  A. 

A  technical  point  must  be  made  concerning  the  parallel  composition  of 
the  timeout  task  with  the  main  task.  Whenever  a  process  takes  a  step,  we 
imagine  that  a  step  of  the  timeout  task  is  performed  first,  possibly  adding 
new  processes  to  halted.  Then  a  step  of  the  main  task  is  performed,  using  the 
(possibly)  new  set  halted.  Even  though  this  appears  to  be  two  transitions 
taken  in  sequence,  it  is  easy  to  see  that  they  can  be  combined  into  a  single 
transition. 

4  Simple  Bounds 

In  this  section  we  briefly  discuss  some  simple  algorithms  for  the  agreement 
problem  in  the  timing-based  model,  and  mention  a  simple  lower  bound. 
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We  first  give  a  method  for  transforming  a  round-based  algorithm  to  an 
algorithm  that  works  in  the  timing-based  model. 

Let  A  be  a  round-based  algorithm  involving  processes  p,-  for  1  <  i  <  n. 
For  each  round  r  >  1,  the  local  protocol  of  p,-  determines  the  messages  that 
Pi  should  send  at  round  r,  based  on  the  messages  received  by  p;  at  rounds  less 
than  r.  Assume  that  A  runs  for  exactly  R  rounds  and  that  every  nonfaulty 
process  sends  a  message  to  every  process  at  every  round  1  through  R .  (The 
transformation  can  be  easily  modified  to  allow  some  processes  to  halt  earlier 
than  the  maximum  round  R.) 

We  describe  an  algorithm  A'  for  the  timing-based  model.  In  this  algo¬ 
rithm,  each  process  includes  a  timeout  task,  as  described  in  the  previous 
section.  Initially,  each  process  sends  its  round  1  messages.  Each  p;  then 
waits,  for  each  pj,  until  it  either  receives  the  round  1  message  of  pj  or  adds 
j  to  its  set  halted.  Then  p;  uses  A  to  compute  its  round  2  messages,  and 
these  messages  are  sent.  Subsequent  rounds  are  handled  similarly. 

By  Properties  T1  and  T2  of  the  timeout  task,  it  should  be  clear  that 
A'  simulates  A  correctly.  To  bound  the  time  of  A /,  let  a  be  an  arbitrary 
/-admissible  timed  execution,  and  define  real  numbers  tT  for  0  <  r  <  R  as 
follows.  (Each  tr  will  be  shown  to  be  an  upper  bound  on  the  time  for  all 
non-halted  processes  to  complete  the  simulation  of  round  r.)  First,  to  = 
start(a).  Second,  define  t\  =  to  +  T  if  some  process  has  a  failure  event  at 
some  time  t  <  to;  otherwise,  define  ti  =  to  +  A.  Finally,  for  2  <  r  <  R, 
define  tr  =  ir_i  -f  T  if  some  process  has  a  failure  event  at  a  time  t  with 
tr_ 2  <  t  <  tr_ i;  otherwise,  define  tr  =  tr_i  +  A.  Since  we  assume  T  >  A, 
we  have  tr  >  tr_ i  +  A  for  all  r  >  1.  It  is  also  easy  to  see  that,  for  every 
r  such  that  a  failure  occurs  at  some  time  f  <  tr_i,  tr  >  ur-i  +  T  where 
ur_i  is  the  maximum  time  f  <  tr_i  such  that  a  failure  occurs  at  time  t.  By 
Property  T2  of  the  timeout  task,  it  follows  easily  by  induction  on  r,  that 
every  process  either  fails  or  completes  round  r  no  later  than  time  tr  in  the 
simulation  of  A  by  A'.  If  there  are  at  most  /  faults,  there  are  at  most  / 
values  of  r  such  that  tT  =  2r-i  +  T.  Therefore,  A'  takes  time  at  most 

T*min{/,jR}  +  A-max{i2-  /,0}. 

Taking  A  to  be  an  (/+l)-round  agreement  algorithm  (such  as  the  algorithm 
of  Dolev  and  Strong  [DS83]  appropriately  modified  for  fail-stop  faults),  this 
transformation  gives  an  upper  bound  of  fT  +  A  on  the  time  to  solve  the 
agreement  problem  with  /  faults.  In  the  case  that  c 2  <  S ,  this  bound  is 
approximately  fCd  +  (/  +  1)/). 
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In  the  case  of  synchronized  start,  there  is  another  approach  that  does  not 
perform  the  timeout  task  at  every  round,  but  runs  a  related  timing  task  to 
ensure  that  the  entire  algorithm  runs  long  enough.  The  main  agreement  task 
in  this  case  uses  a  “flooding”  strategy.  If  a  process  p ;  receives  a  message  1 
(at  either  an  input  event  or  a  delivery  event)  and  if  p;  has  not  yet  decided,  p; 
broadcasts  the  message  1  and  decides  1.  It  is  easy  to  see  that,  in  any  timed 
execution,  if  any  correct  process  receives  a  1,  then  some  correct  process 
receives  a  1  no  later  than  time  fD.  Since  this  correct  process  broadcasts  a 
1,  all  correct  processes  receive  a  1  no  later  than  time  (/  +  1  )D.  Therefore, 
any  process  that  has  run  for  time  strictly  more  than  (/  +  1)-D  can  decide 
0.  To  ensure  that  this  much  time  has  elapsed,  each  process  counts  k  = 
[(/  +  l)D/ciJ  -f  1  of  its  own  steps.  This  agreement  algorithm  takes  time  at 
most  C2k.  This  upper  bound  is  approximately  (/  +  1  )Cd.  (This  bound  is 
better  than  the  one  for  the  simple  simulation  above  when  Cd  <  (/  +  1)£.) 

Note  that  both  upper  bounds  contain  the  term  fCd.  Intuitively,  this 
means  that  these  algorithms  can  use  /  sequential  “long”  timeouts,  where  a 
long  timeout  takes  time  at  least  Cd.  In  the  next  section,  we  give  a  more 
subtle  algorithm  with  a  time  bound  that  involves  only  one  long  timeout. 

As  for  lower  bounds,  for  any  positive  integer  k ,  it  is  not  difficult  to 
translate  a  timing-based  protocol  that  takes  time  strictly  less  than  kd  to  a 
round-based  protocol  that  works  in  k  —  1  rounds.  Thus,  the  lower  bound  of 
/  +  1  rounds  for  agreement  with  /  faults  translates  easily  to  a  lower  bound 
of  (/  +  1  )d  time.  (This  bound  assumes  that  /  <  n  -  2,  since  the  original 
round-based  bound  assumes  this.) 

5  The  Upper  Bound 

Now  we  present  our  main  result,  which  shows  how  the  upper  bound  can  be 
improved  so  that  Cd  is  not  multiplied  by  /,  but  only  by  1. 

Theorem  5.1  There  is  an  algorithm  to  solve  the  agreement  problem  for  f 
faults  within  time  (2/  -  1)A  +  max{T,3A}. 

Substituting  the  value  of  T  obtained  in  Section  3,  the  following  corollary 
is  immediate. 

Corollary  5.2  There  is  an  algorithm  to  solve  the  agreement  problem  for  f 
faults  within  time  2/A  +  ma x{CD  +  C2,2A). 

Assuming  that  C2  <C  6  and  Cd  >  26 ,  this  upper  bound  is  approximately 
2fb  +  Cd.  If  6  —  d,  the  bound  is  approximately  2 fd  +  Cd. 
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5.1  The  Algorithm 

In  addition  to  the  local  state  components  of  the  timeout  process,  and  halted 
and  blocked  (as  described  in  Section  3),  we  assume  that  the  local  state  of 
Pi  contains  components  and  r,  plus  a  component  buff  to  hold  incoming 
messages,  plus  a  component  to  record  decisions.  The  component  V{  is  the 
“input  value  component”  -  an  input  event  input(i,v)  sets  V{  to  v.  The  com¬ 
ponent  r  holds  a  nonnegative  integer  phase  number ,  initially  0.  A  decide(v) 
operation  causes  pi  to  enter  a  decision  state  for  value  v  (by  recording  the 
decision  in  the  appropriate  state  component)  and  set  blocked  to  true  (to  stop 
all  nontrivial  transitions,  including  those  of  the  timeout  task). 

Now  we  give  an  informal  description  of  the  algorithm,  more  specifically, 
of  the  steps  of  process  pi  that  are  associated  with  comp(i,  S)  events.  The 
algorithm  is  given  in  more  detail  in  Figure  2.  This  description  and  the 
associated  code  omit  the  timeout  task  behavior,  as  well  as  the  handling  of 
inputs  and  delivered  messages. 

The  algorithm  proceeds  in  a  sequence  of  phases,  numbered  consecutively 
starting  with  0.  Each  process  attempts  to  reach  a  decision  at  each  phase; 
however,  at  even-numbered  phases,  processes  are  only  permitted  to  decide  on 
0,  whereas  at  odd-numbered  phases  they  can  only  decide  on  1.  Furthermore, 
a  process  is  only  permitted  to  decide  at  a  phase  r  provided  it  knows  that  no 
process  has  decided  at  phase  r  -  1.  Thus,  if  any  process  decides  at  phase 
r,  the  algorithm  ensures  that  no  process  can  decide  at  phase  r  +  1.  More 
strongly,  in  this  case  the  algorithm  ensures  that  every  non-failed,  undecided 
process  learns  in  phase  r  +  2  that  no  process  has  decided  at  phase  r  +  1, 
and  then  decides  at  phase  r  +  2.  Since  r  +  2  and  r  have  the  same  parity,  it 
follows  that  all  decisions  agree. 

Validity  is  ensured  by  forcing  all  non-failed  processes  to  decide  at  phase 
0  in  case  they  all  have  input  0,  and  at  phase  1  in  case  they  all  have  input  1. 
To  ensure  termination,  if  a  phase  r  occurs  during  which  no  process  fails,  and 
such  that  no  process  has  decided  up  through  phase  r,  then  the  algorithm 
ensures  that  every  nonfaulty  process  will  decide  no  later  than  phase  r  +  1. 
(Such  a  phase  must  occur  among  the  first  /  +  1  phases.) 

The  mechanism  used  by  the  algorithm  to  guarantee  all  of  these  properties 
is  the  following.  If  a  process  fails  to  decide  at  any  phase  r,  it  broadcasts 
the  number  r  before  going  on  to  the  following  phase  r  +  1.  On  the  other 
hand,  if  a  process  decides  at  phase  r,  it  “skips”  broadcasting  r  and  instead 
broadcasts  r+  1,  before  deciding  and  terminating.  In  order  for  a  process  to 
decide  at  phase  r  >  1,  it  ensures  that  it  has  received  the  message  r  —  1  from 
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initial  next-phase  transition 


Precondition: 
r  =  0 

V{  =  1 

Effect: 

broadcast((0,r)) 
r  :=  1 

Precondition:  initial  decision  transition 

r  =•  0 

Vi  =  0 

Effect: 

broadcast((l,t)) 

decide(O) 

Precondition:  next-phase  transition 

r  >  1 

there  exists  a  j  such  that  (r,  j)  €  buff 
Effect: 

broadcaster,  t)) 
r  :=  r  +  1 

Precondition:  decision  transition 

r  >  1 

for  all  j  $  halted ,  (r  —  l,j)  €  buff 
there  is  no  j  such  that  (r,  j)  €  buff 
Effect: 

broadcaster  + 1,*)) 
decide(r  mod  2) 


Figure  2.  The  main  agreement  algorithm  for  process  p,\ 


all  non-halted  processes,  and  no  message  r  from  any  process.  This  ensures 
that  if  a  process  decides  at  phase  r  then  no  process  has  decided  at  phase 
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Also,  if  some  process  p  decides  at  phase  r,  then  every  undecided  process 
receives  the  message  r  +  1  from  p  at  phase  r  +  1,  but  no  message  r  from  p 
(since  p  skips  sending  r).  This  ensures  that  each  undecided  and  non- failed 
process  broadcasts  r  +  1  and  goes  on  to  phase  r  +  2.  Then  every  undecided, 
non-failed  process  will  receive  the  message  r+1  from  all  non- failed  processes, 
and  no  message  r  +  2  from  any  process.  It  follows  that  each  undecided,  non- 
failed  process  decides  at  phase  r  +  2. 

The  algorithm  allows  any  process  having  input  0  to  decide  at  phase  0. 
If  all  processes  have  input  1,  then  no  process  decides  at  phase  0.  In  this 
case,  every  non-failed  process  broadcasts  0  and  no  process  sends  1,  so  that 
every  process  has  its  precondition  for  decision  satisfied  at  phase  1.  Validity 
is  thus  guaranteed. 

For  termination,  suppose  that  a  phase  r  occurs  during  which  no  process 
fails,  and  such  that  no  process  decides  up  to  and  including  phase  r.  Then  no 
process  sends  the  message  r  +  1,  all  non-failed  processes  send  the  message 
r,  and  so  the  preconditions  for  every  process  to  decide  at  phase  r  +  1  are 
satisfied. 

The  transitions  corresponding  to  comp(i,S )  events  of  p,-  are  shown  in 
more  detail  in  Figure  2.  The  code  contains  preconditions  for  the  various 
cases;  note  that  in  every  state  of  p;,  at  most  one  of  the  four  cases  has  its 
precondition  satisfied.  Since  comp(i,  S )  events  are  required  to  be  enabled  in 
all  states,  we  use  the  convention  that  any  state  in  which  none  of  the  four 
preconditions  is  satisfied  has  a  “dummy”  transition  enabled,  which  causes 
no  changes  to  the  state  and  no  messages  to  be  sent. 

A  formal  proof  of  correctness  appears  in  Subsection  5.2. 

We  indicate  why  the  time  required  for  this  algorithm  to  terminate  only 
involves  a  single  occurrence  of  the  timeout  bound  T  «  Cd+ 6,  not  multiplied 
by  /.  Note  that  the  only  transition  that  occurs  because  of  a  timeout  is  the 
(non-initial)  decision  transition.  Suppose  this  transition  is  ever  begun  by  a 
process  pi  at  a  phase  r  and  no  (r,j)  message  ever  arrives  at  p,\  Then  the 
timeout  can  take  time  T,  but  then  all  non-failed  processes  will  decide  very 
quickly  and  terminate  the  computation.  (In  fact,  all  such  processes  must 
decide  by  the  same  phase  r,  since  otherwise  they  would  send  (r,j)  messages 
to  p{.)  On  the  other  hand,  suppose  that,  at  all  phases  r  prior  to  some 
particular  phase  h,  whenever  a  process  p,-  begins  the  decision  transition, 
some  ( r,j )  message  does  arrive  at  p,-.  Then  all  (r,  jf)  messages  must  arrive 
at  pi  after  the  transition  (or  the  transition  would  not  be  enabled).  Then  we 
claim  that  each  such  phase  r  takes  only  time  depending  on  f6 ,  but  not  on 
T.  This  is  because  each  ( r,j )  message  originates  (either  directly  or  via  a 
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chain  of  rebroadcasts)  when  some  process  first  begins  phase  r.  The  length 
of  a  shortest  such  chain  can  be  at  most  /  +  1  (because  a  non-failed  process 
succeeds  in  communicating  its  message  to  everyone).  Therefore,  the  time 
for  phase  r  is  bounded  by  (/  + 1)5,  the  length  of  the  chain  multiplied  by  the 
time  to  deliver  each  message  in  the  chain. 

A  careful  analysis  appears  in  Subsection  5.3. 

5.2  Correctness  Proof 

When  we  say  that  a  process  begins  a  transition,  we  mean  that  the  precondi¬ 
tion  for  the  transition  is  satisfied  and  either  the  associated  comp(i,  S)  step 
or  an  associated  fail(i,S )  step  is  performed.  Thus,  this  does  not  necessarily 
mean  that  the  transition  described  in  the  code  is  completed,  i.e.,  that  the 
associated  comp{i,  S )  step  is  performed.  Note  that  for  each  r  >  0,  pi  be¬ 
gins  at  most  one  of  the  next-phase  or  decision  transitions;  we  call  this  the 
rth  phase  of  p,\  Note  also  that  if  p,-  decides  at  phase  r,  then  p,-  completes 
the  decision  transition  at  phase  r  so  it  sends  the  message  (r  +  l,t)  to  all 
processes. 

An  r-message  is  any  message  of  the  form  (r,i)  for  some  i.  It  follows  from 
the  code  that  an  r-message  is  sent  either  at  a  decision  transition  at  phase 
r  -  1,  or  at  a  next-phase  transition  at  phase  r. 

We  first  prove  progress,  i.e.,  that  nonfaulty  processes  do  not  get  “stuck” 
in  a  phase:  they  either  decide  or  advance  to  the  next  p^ase. 

Lemma  5.3  Let  r  >  0,  and  let  p,-  be  nonfaulty  process.  Then  p,-  either 
decides  at  a  phase  strictly  less  than  r,  or  begins  a  transition  at  phase  r. 

Proof:  Suppose  not.  Let  r  be  the  first  phase  at  which  a  nonfaulty  process 
gets  stuck,  and  let  p;  be  a  nonfaulty  process  that  does  not  increase  its  phase 
to  r  +  1.  Since  it  is  not  possible  for  any  process  to  get  stuck  at  phase  0,  it 
must  be  that  r  >  1.  Process  p,  eventually  times  out  every  process  pj  that 
fails  or  decides,  by  Property  T2  of  the  timeout  task. 

So  consider  any  process  pj  that  does  not  fail  or  decide.  By  choice  of 
r,  pj  eventually  reaches  phase  r.  Since  pj  does  not  decide  at  phase  r  -  I, 
it  must  have  set  its  phase  to  r  using  a  next-phase  transition.  This  imples 
that  pj  sends  an  (r  —  l)-message  to  p,-.  Hence,  p,-  eventually  receives  an 
(r  -  l)-message  from  pj  and  uses  it  to  satisfy  its  waiting  condition  for  pj. 

Thus,  pi  eventually  satisfies  its  waiting  conditions  for  all  pj  and  is  able 
to  begin  a  transition  at  phase  r,  a  contradiction  to  the  choice  of  r  and  p,-. 
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We  next  give  some  preliminary  lemmas.  Some  of  these  lemmas  will  also 
be  used  later  in  the  timing  analysis. 

Lemma  5.4  If  Pi  begins  a  decision  transition  at  phase  r  >  0,  then  p,-  sends 
no  r-messages. 

Proof:  If  r  =  0,  then  by  the  initial  decision  transition,  p;  sends  no  0- 
messages.  Assume  r  >  1.  If  p,-  sends  r  at  phase  r  —  1,  p;  begins  a  decision 
transition  at  phase  r  -  1  and  does  not  execute  phase  r.  Since  p,-  begins  a 
decision  transition  at  phase  r,  it  does  not  begin  a  next-phase  transition  at 
phase  r,  and  thus  does  not  send  an  r-message  at  phase  r.  ■ 

Lemma  5.5  If  Pi  decides  at  phase  r  >  0,  then  no  process  begins  a  decision 
transition  at  phase  r  +  1. 

Proof:  Assume,  by  way  of  contradiction,  that  some  process  pj  begins  a 
decision  transition  at  phase  r  +  1.  Then  prior  to  this  decision  transition, 
either  an  r-message  from  p,-  is  delivered  to  pj,  or  pj  adds  i  to  its  set  of 
halted  processes.  By  Lemma  5.4,  p,-  does  not  send  any  r-messages,  so  the 
only  possibility  is  that  pj  adds  i  to  halted.  By  the  decision  transition  rule, 
Pi  succeeds  in  broadcasting  r  + 1.  But,  by  Property  T1  of  the  timeout  task, 
all  messages  sent  by  p,-  to  pj  are  delivered  to  pj  before  it  adds  i  to  halted. 
Thus,  an  (r-f  l)-message  must  be  delivered  to  pj  before  it  begins  the  decision 
transition.  But  this  contradicts  the  precondition  for  the  decision  transition. 

■ 

We  next  give  a  definition  that  will  be  central  to  both  the  correctness 
proof  and  the  timing  analysis.  A  phase  r  is  quiet  if  there  exists  a  process  p; 
such  that  no  process  pj  sends  an  r-message  to  p,\ 

Lemma  5.6  Suppose  r  >  1.  If  no  process  begins  a  decision  transition  at 
phase  r  —  1,  then  phase  r  is  quiet. 

Proof:  This  is  true  because  an  earliest  sending  of  an  r-message  must  occur 
at  a  decision  transition  at  phase  r  —  1.  ■ 

Lemma  5.7  If  phase  r  is  quiet,  then  all  processes  either  fail  or  decide  by 
the  end  of  phase  r. 
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Proof:  Suppose  not;  let  p;  be  a  process  that  does  not  fail  or  decide  by  the 
end  of  phase  r.  By  Lemma  5.3,  pi  must  exit  phase  r,  so  it  must  perform 
a  next-phase  transition  at  phase  r.  Since  p;  does  not  fail,  it  broadcasts  r. 
This  contradicts  the  assumption  that  phase  r  is  quiet.  ■ 

Lemma  5.8  Assume  that  some  process  decides  at  phaser.  Then  phase  r+ 2 
is  quiet  and  all  processes  either  fail  or  decide  no  later  than  phase  r  +  2. 

Proof:  By  Lemma  5.5,  no  process  begins  a  decision  transition  at  phase 
r+1.  By  Lemma  5.6,  this  implies  that  phase  r+2  is  quiet.  So  by  Lemma  5.7, 
all  either  fail  or  decide  no  later  than  phase  r  +  2.  ■ 

Now  we  can  prove  the  agreement  property. 

Lemma  5.9  No  two  processes  decide  on  different  values. 

Proof:  Let  r  be  the  minimal  phase  at  which  any  process  decides,  and  let 
pi  be  a  process  that  decides  at  phase  r.  By  Lemma  5.5  no  process  begins 
a  decision  transition  in  phase  r+1.  By  Lemma  5.8,  all  processes  either 
fail  or  decide  no  later  than  phase  r  +  2.  Since  r  is  minimal,  it  follows 
that  all  nonfaulty  processes  decide  at  phase  r  or  at  phase  r  +  2.  Since 
r  mod  2  =  (r  +  2)  mod  2,  they  decide  on  the  same  value.  ■ 

We  next  prove  the  validity  property. 

Lemma  5.10  If  pi  decides  v  then  there  exists  some  pj  that  starts  with  vj  = 
v. 

Proof:  Assume  by  way  of  contradiction  that  all  processes  start  with  v'  £  v. 
If  v'  =  0  then  all  nonfaulty  processes  decide  on  0  at  phase  0.  If  v'  =  1  then 
no  process  begins  a  decision  transition  at  phase  0,  so  Lemma  5.6  implies 
that  phase  1  is  quiet,  and  so  by  Lemma  5.7  all  nonfaulty  processes  decide 
on  1  at  phase  1.  Either  case  yields  a  contradiction.  ■ 

We  next  argue  termination. 

Lemma  5.11  Any  f -admissible  timed  execution  contains  a  quiet  phase, 
numbered  no  larger  than  f  +  2. 


18 


Proof:  If  some  process  decides  at  phase  r  <  /,  then  Lemma  5.8  implies 
that  phase  r  +  2  is  quiet.  So  suppose  that  no  process  decides  at  any  phase 
r  with  r  <  /.  Since  there  are  at  most  /  failures,  there  must  be  some  phase 
r,  0  <  r  <  /,  at  which  no  process  fails;  let  h  be  some  such  phase.  Since 
h  <  /,  no  process  decides  at  phase  h.  In  fact,  no  process  p;  begins  a  decision 
transition  at  phase  h,  because  otherwise  p,-  would  complete  this  transition 
without  failing.  Therefore,  by  Lemma  5.6,  phase  /i+l</  +  lis  quiet.  ■ 

Lemma  5.12  In  any  f -admissible  timed  execution  of  the  algorithm  all  pro¬ 
cesses  either  fail  or  decide  no  later  than  phase  f  +  2. 

Proof:  By  Lemma  5.11,  any  /-admissible  timed  execution  contains  a  quiet 
phase,  numbered  no  larger  than  /  +  2.  Then  Lemma  5.7  implies  that  all 
processes  either  fail  or  decide  by  phase  /  +  2.  ■ 

Remark  1  Our  algorithm  does  not  require  an  a  priori  upper  bound  on  the 
number  of  faults.  All  nonfaulty  processes  decide  no  later  than  phase  /  +  2, 
where  /  is  the  number  of  faults  that  actually  occur  in  the  execution.  In 
consequence,  the  algorithm  is  an  “early  stopping”  algorithm  (cf.  [DRS82]). 
If  an  upper  bound  /  is  known  a  priori ,  the  algorithm  can  be  modified  so 
that,  if  pi  has  not  yet  decided  when  it  makes  a  next-phase  transition  from 
phase  /  + 1  to  phase  / + 2,  then  p;  can  immediately  decide  on  (/ + 2)  mod  2. 
Since  p,-  decides  no  later  than  the  end  of  phase  /  +  2,  there  is  no  need  to 
actually  execute  phase  /  +  2. 

5.3  Timing  Analysis 

Some  notation  to  describe  the  number  of  failures  is  useful.  For  each  r  >  1, 
denote  by  fr  the  number  of  processes  whose  failure  step  is  a  transition 
during  which  an  r-message  should  be  broadcast  (so  this  is  either  a  decision 
transition  at  phase  r  —  1  or  a  next-phase  transition  at  phase  r).  Note  that  a 
process  has  at  most  one  failure  step  and  thus,  in  all  /-admissible  executions, 

]Cr>i  fr  —  f  • 

The  key  idea  behind  the  upper  bound  is  that,  if  a  phase  r  is  not  quiet, 
then  the  time  of  the  phase  can  be  bounded  above  by  a  quantity  which 
depends  on  fT  but  not  on  C.  Moreover,  the  time  for  any  phase  (in  particular, 
the  first  quiet  phase)  is  at  most  T  «  Cd  +  S.  By  Lemma  5.7,  all  nonfaulty 
processes  decide  no  later  than  the  end  of  the  first  quiet  phase.  Since  a  quiet 
phase  must  occur  before  too  many  phases  ha.ve  elapsed,  the  bound  follows. 
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In  more  detail,  fix  an  arbitrary  /-admissible  timed  execution  a.  We 
introduce  some  notation;  all  definitions  are  with  respect  to  a.  For  r  >  0, 
define  tr  to  be  the  minimum  time  t  such  that  all  processes  either  fail,  decide, 
or  perform  a  phase  r  transition  no  later  than  time  t.  Note  that  tr  <  tr+ 1 
for  all  r,  and  to  <  where  s  =  start(oc).  Let  tjee  be  the  minimum  time  t 
such  that  all  processes  either  fail  or  decide  no  later  than  time  t.  Let  h  be 
the  smallest  r  such  that  phase  r  is  quiet.  It  follows  from  Lemma  5.11  that 
h  exists  and  h<  f  +  2. 

It  is  convenient  to  handle  the  cases  h  =  0  and  /  =  0  separately.  If 
h  =  0,  then  Lemma  5.7  implies  that  the  algorithm  takes  time  zero.  If  /  =  0, 
then  since  there  are  no  failures  it  is  easy  to  see;  that  all  processes  decide  no 
later  than  the  end  of  phase  2,  and  that'  phases  1  and  2  take  time  at  most  A 
each.  The  time  bound  claimed  in  Theorem  5.1  is  at  least  2A  when  /  =  0. 
Henceforth  we  assume  that  h  >  L  and  /  >  1. 

We  begin  with  a  simple  lemma  stating  that  every  phase  takes  at  most 
time  T. 

Lemma  5.13  For  any  phase  r  >  1,  tr  <  tr-1  +  T. 

Proof:  Consider  any  process  p,-  that  does  not  fail  or  decide  by  time  tr-i  + 
T.  If  any  process  pj  decides  at  phase  r  -  1,  then  within  time  A.  after  p/’s 
decision  transition,  (and  so  by  time  <r_j  -1-  A  <  tr_j  +  T),  pi  receives  an 
r-message  and  performs  a  phase  r  next-phase,  transition. 

Now  assume  that  no  process  decides  at  phase  r  —  1.  For  any  process 
Pj  that  fails  or  decides  at  or  before  its  phase  r  -  1  transition,  p;  puts  j 
into  its  halted  set  and  takes  a  subsequent  computation  or  failure  step  by 
time  /r_i  +  T.  Also,  every  process  that  does  not  fail  or  decide  at  or  before 
its  phase  r  —  1  transition  completes  a  phase  r  —  1  next-phase  transition,  in 
which  it  sends  an  (r  -  l)-message;  this  message  is  received  by  p;  by  time 
tr-i  +  A  <  tr-i  +  T.  Since  no  process  decides  at  phase  r  —  1,  p,-  receives 
no  r-messages.  It  follows  that  pt-  performs  a  phase  r  decision  transition  by 
time  tr-i  +  T. 

Applying  the  preceding  argument  to  all  p,-,.we  conclude  that  tT  <  tr_i  + 
T.  ■ 


The  next  lemma  is  the  key  to  the  upper  bound.  It  says  that  the  time 
required  by  a  non-quiet  phase  is  short  (in  particular,  independent  of  C). 
The  reason  is  that  the  length  of  such  a  phase  is  bounded  by  the  time  to 
deliver  a  chain  of  messages  of  length  one  more  than  the  number  of  failures 
at  that  phase.  The  details  follow. 
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Lemma  5.14  For  any  r  with  1  <  r  <  h  -  1,-  tT  <  tr_i  -r  A(/r  +  1). 

Proof:  Let  p,-  be  an  arbitrary  process.  Assume  that  p{  does  not  fail,  decide, 
or  perform  a  phase  r  transition  before  time  <r_i  +  A(/r  +  1).  Since  phase 
r  is  not  quiet,  some  process  sends  an  r-message  to  p,\  By  inspection  of  the 
algorithm,  there  must  be  a  sequence  t'o> . . . ,  **  of  distinct  process  indices  with 
if.  =  i,  such  that  p,-0  sends  an  r-message  to  p,-,  while  performing  a  decision 
transition  at  phase  r-1  and,  for  1  <  j  <  fc— 1,  p;s  sends  an  r-message  to  pij+1 
while  performing  a  next-phase  transition  at  phase  r.  Choosing  the  sequence 
of  process  indices  so  that  k  is  minimized,  it  follows  that,  for  0  <  j  <  k  -  2, 
Pi-  fails  during  the  broadcast  of  the  r-message.  For  if  pi-  does  not  fail,  then 
it  sends  an  r-message  to  p;,  so  to, ... ,  ij,  i  would  give  a  path  of  length  less 
than  k  from  p,-0  to  p,\ 

By  definition  of  /r,  we  have  k  - 1  <  fr.  Since  p,0  sends  the  r-message  no 
later  than  time  tr_i ,  and  p;, , . . . ,  p,-fc  enter  phase  r  no  later  than  time  tr_i, 
it  follows  that  p;  receives  the  r-message  and  satisfies  the  precondition  for  a 
next-phase  transition  no  later  than  time  tr_i  +  kA  <  tr-i  +  (/r  +  1)A.  ■ 

Now  by  induction  we  have: 

Corollary  5.15  For  every  r  with  1  <r  <  h  —  1,  tr<A»  +  1)  +  s. 

At  this  point,  we  can  give  a  simple  proof  of  an  upper  bound  result  that 
is  slightly  weaker  than  the  one  claimed  in  Theorem  5.1.  We  include  this 
result  here  in  order  to  give  the  reader  an  intuition  why  the  bound  takes  the 
general  form  it  does  (with  the  timeout  bound  T  appearing  only  once). 

Theorem  5.10  There  is  an  algorithm  to  solve  the  agreement  problem  for  f 
faults  within  time  (2/  +  1)A  +  T. 

Again  assuming  C2  <  6,  this  bound  is  approximately  (2/  +  2)£  +  Cd. 

Proof:  By  Lemma  5.7,  we  have  tiec  <  th •  Lemma  5.13  implies  that  tr  < 
tr_ i  +  T  for  any  phase  r.  Therefore,  tdec  <  th- 1  +  T.  Now 

tdec  —  th—1  d*  1 

<  A. •  EiZiifi  +  1)  +  T  +  s  by  Corollary  5.15, 

<  (/  +  (/*—  1))A  +  T  +  s 

<  (2/  +  1) A  +  T  +  s  since  h  <  f  +  2. 


t 
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Now  we  carry  out  the  finer  analysis  needed  to  get  the  smaller  bound 
given  in  Theorem  5.1.  The  smaller  bound  is  close  (within  OfaiC  +  /))) 
to  the  actual  worst-case  running  time  of  the  algorithm;  details  are  given  in 
Remark  2  below.  The  better  bound  is  obtained  by  considering  the  latest 
time  at  which  a  failure  occurs.  If  this  time  is  not  too  large,  then  a  better 
bound  can  be  obtained  since  the  time  T  taken  by  the  timeout  task  can  then 
be  measured  starting  from  the  time  of  the  latest  failure.  Let  tu,t  be  the 
maximum  time  such  that  t  <  t^-i  and  .such  that  .some  process  has  a 
failure  event  at  time  */„,<.  If.no  process  has  afailure  event. at  a  time  <  lh-u 
then  take  tiatt  =  —T.  We  begin  with  an  upper  bound  on  t jec  that  may  be 
smallerthan  theLound  4-i  +  T  usediin  the  proof  of  Theorem  5.16. 

Lemma  5.17  tdec  <  max{t^_i  +  A ,  f/aa<  +  T;}. 

Proofs  By  Lemma  5.7,  tjee  <  tf,  so  it  is  .sufficient  to  bound  %.  Let  p,-  be 
a  process  that  does  not  fail,  decide,  or  perform  a  phased  transition  before 
time  tmax  =:max{t/,_i  +  A,  t/a#<  +  T  }.  Letpj  be  an  arbitrary  process.  We 
show  that, ^by  time  tmax,  either  j-islnpi’sTia/fed  setorp;  Teceives  an  (/i  - 1)- 
message  or  an  h-message  from.pj.  Therefore, ’by  time  tmar,  p,-  performs  a 
phased  transition. 

If  Pj  fails  at  time  t  where  t  <  4-i ,  ’then  it  <  tiatt  y  - so  ,p;  adds  j  to  its 
halted' set,no  later  than  time  !/„,<  +T(by  Property  T2  of;the  timeout  task). 
In  the  remaining  cases,  assume  that  ,pj  does  mot  fail  at.alime  t  <th-i- 

Suppose  that  pj  performs  a  transition  at  phase  A— 1.  Since  py-does  not 
fail  at  this  transition,  pj  sends  either  an  (7i —  l‘)-message  or  an  h- message  to 
Pi.  Since  the  sending  is  done  no  later  than  .time.t/l_i,  Pi  receives  the  message 
no  later  than  time-fy,_i  +  A. 

The  only  other  possibility  is  that  pj  decides  at  some  phase  r  <  h  -  2. 
Since  p;  does  not  fail  or  decide  by  the  end  of  phase  h  -  1,  it  follows  from 
Lemma  5.8  that  pj  does  not  decide  at  any  phase  r  <  h  —  3.  Therefore,  Pj 
decides  at  phase  h  —  2  and  broadcasts  an  (/i—  l)-message.  As  in  the  previous 
case,  this  message  is  received  by  p,-  no  later 'than  time4_2-f  A  <  4_i +A. 

■ 


We  now  use  Lemma  5.17  to  bound  tiee. 

Lemma  5.18  tdec  <  max{<(2/  +  2)A,  (2/  -  1)A  +iF}  +  s. 
Proof:  We  consider  three  cases. 
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Case  1:  h  <  /. 

Since  tdcc  <  th-i  +  T,  Corollary  5.15  gives 

tdec  ^  th—  1  d*  T 
h-1 

<  A  •  YAfi  +  1)  +  T  +  S 

t=l 

<  (/  +  (A-i))A  +  r  +  « 

<  (2/  —  1)A  +  T  +  s. 

Case  2:  f  +  1  <  h  <  f  +  2  and  tiait  <  tf-i. 

First,  since  /  —  1  <  h  —  lwe  have 

/-i 

tlast  <  if- 1  <  A  •  YXfi  +  1)  +  s  ^  (2/  -  1)A  +  s. 

i= 1 

Since  h  —  l</  +  lwe  have 

h-i 

th- 1  <  A  •  Y'Xfi  +  1)  +  3  ^  (2/  +  1)A  +  s. 

i'=i 

Substituting  these  bounds  for  and  th- 1  into  Lemma  5.17  gives 

tdec  <  max{(2/+l)A  +  5  +  A,  (2/-l)A  +  3  +  T} 

=  max{  (2/  +  2)A,  (2/  -  1)A  +  T  }  +  s. 

Case  S:  f  +  1  <  h  <  f  +  2  and  t/ai<  >  t/_i . 

Claim  5.19  /r  >  0  for  1  <  r  <  /  —  1. 

Proof:  Suppose  that  fr  =  0  for  some  r  <  f  —  1.  Since  phase  r  is  not  quiet, 
some  process  sends  an  r-message,  and  the  earliest  sending  of  an  r-message 
must  be  at  a  decision  transition  at  phase  r  —  1.  Since  fr  =  0  means  that 
there  are  no  failures  during  a  broadcast  of  an  r-message,  it  follows  that  some 
process  decides  at  phase  r  —  1.  3y  Lemma  5.8,  phase  r  +  1  is  quiet.  Since 
r  +  1  5:  />  this  contradicts  the  assumption  that  phase  h  >  f  +  1  is  the  first 
quiet  phase.  ■ 

Since  phase  /  is  not  quiet,  a  /-message  is  sent  by  some  process.  Let  p  be 
a  process  that  sends  an  /-message  at  the  earliest  time.  Therefore,  p  sends 


23 


the  /-message  while  performing  a  decision  transition  at  phase  /  —  1,  and 
this  occurs  no  later  than  time  t/-i. 

We  first  argue  that  p  decides  at  phase  /  —  1.  If  not,  then  p  fails  no  later 
than  time  f/_i  while  broadcasting  an  /-message.  Since  fr  >  0  for  r  <  f  - 1, 
the  remaining  /  —  1  failures  occur  while  some  process  is  broadcasting  an 
r-message  for  each  r  with  1  <  r  <  /  -  1.  Since  these  remaining  failures 
occur  at  phases  numbered  at  most  /-  1,  it  follows  that  all  failures  occur  no 
later  than  time  tj-\.  This  contradicts  the  assumption  that  tia,t  >  tj-\. 

Since  p  decides  at  phase  /-— 1,  h  =  / +1  by  Lemma  5.8,  and  p  broadcasts 
an  /-message  no  later  than  time  i/_ j.  Therefore 

4-1  =  t}  <  tj- 1  4  A.  (1) 

The  final  ingredient  for  this  case  is  the  observation  that 

/-i 

£/(</-!• 

«•= 1 

Otherwise,  all  failures  occur  during  the  broadcast  of  r-tnessages  for  1  <  r  < 
/  —  1;  as  argueu  above,  this  contradicts  the  assumption  that  </„,<  >  t/_i. 

Finally,  we  have 

4ec  5:  4-i  +  T 

<  t/_i  +  A  +  T  by  (1) 

<  A  •  ('/*  + 1)  +  s  +.A  +  T 

<  ((/  -!)  +  (/-  1))A  +  s  +  A  +  T  by  (2) 

=  (2/  -  1)A  +  T  +  s. 


■ 

Since  the  upper  bound  of  Lemma  5.18  can  be  written  as  (2/  -  1)A  4 
ma x{T,  3A}  4  s,  the  proof  of  Theorem  5.1  is  complete. 

■Remark  2  It  is  possible  to  construct  an  execution  of  the  algorithm  that 
takes  time  at  least  2/6  4  Cd,  assuming  1  <  /  <  n  —  2  and  C  >  2.  Hints: 
One  process  has  initial  value  0  and  the  others  have  initial  value  1;  fr  —  1 
for  1  <  r  <  /  —  1,  fr  =  0  for  r  =  /,  and  /r  =  1  for  r  =  /  +  1;  the  message 
delivery  times  are  arranged  so  that  phases  1,2,. .. ,  /  -  1  take  time  2 6  each, 
phase  /  takes  time  6,  and  phase  /  4  1  takes  time  T  >  Cd  4  6. 
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Remark  3  The  agreement  algorithm  has  high  message  complexity.  This  is 
due  mainly  to  the  timeout  task  where  every  process  broadcasts  a  message 
at  every  step — the  main  task  sends  a  total  of  0(n2/)  messages,  since  each 
process  broadcasts  a  message  at  each  phase  transition.  An  obvious  approach 
for  decreasing  the  message  complexity  of  the  timeout  task  is  to  broadcast  the 
alive  message  once  every  k  steps  for  some  k  >  2.  Of  course,  the  maximum 
value  of  the  counters  must  then  be  adjusted  upward,  and  the  timeout  bound 
T  increases  accordingly. 

For  the  case  of  synchronized  start,  another  approach  is  to  dispense  with 
the  timeout  task  completely,  and  build  special  timeout  mechanisms  into  the 
main  algorithm.  Specifically,  whenever  p;  makes  a  next-phase  transition 
from  phase  r  —  1  to  phase  r,  it  initializes  a  counter  counter(j )  for  each 
Pj.  Each  counter  counter(j)  is  incremented  at  each  step  until  either  (i)  pi 
receives  an  7--message  (causing  it  to  perform  a  next-phase  transition),  or 
(ii)  the  message  (r— 1,  j)  is  found  in  buff,  or  (iii)  counter(j)  reaches  [2D/ci\  + 
1.  In  case  (iii),  p,-  adds  j  to  halted.  The  modified  algorithm  is  correct  since, 
whenever  p,-  broadcasts  an  (r  —  l)-message  during  a  next-phase  transition  at 
phase  r  - 1,  it  should  receive  either  an  (r  —  l)-message  or  an  r-message  from 
every  nonfaulty  nondecided  process  within  time  2D.  The  modified  algorithm 
sends  a  total  of  0(n2f)  messages.  Each  message  has  length  O(lcgn)  bits. 
By  a  timing  analysis  similar  to  that  of  Theorem  5.16,  an  upper  bound  of 
(2/  +  1)A  4-  2 CD  +  C2  «  (2/  +  1)5  +  2 Cd  can  be  shown. 

5.4  Extension  to  Multiple  Values 

In  this  section  we  discuss  how  to  modify  the  algorithm  to  handle  an  arbi¬ 
trary  value  set  V.  This  is  done  by  running  n  single-source  algorithms  in 
parallel.  In  the  single-source  agreement  problem,  a  single  process  p,-,  the 
source ,  starts  with  an  initial  value  from  V.  Shortly  we  describe  an  algo¬ 
rithm  for  the  single-source  problem  with  the  following  properties.  Let  1 
be  a  distinguished  default  value  in  V.  Suppose  that  the  source  has  initial 
value  v.  Then  all  nonfaulty  processes  decide  on  either  v  or  1,  and  all  decide 
the  same;  moreover,  if  the  source  is  nonfaulty,  then  all  nonfaulty  processes 
decide  on  v.  To  solve  the  general  agreement  problem,  run  n  single-source 
algorithms,  Ai,...,An,  in  parallel  with  p,-  being  the  source  in  A When 
some  process  pj  has  reached  a  decision  W{  in  A ,•  for  all  i,  it  decides  on  Wk 
where  k  is  the  least  integer  such  that  wj t  ^  1,  provided  that  such  a  k  exists. 
If  Wi  =  1  for  all  i,  then  pj  decides  on  1. 

To  describe  a  solution  to  the  single-source  problem,  we  refer  to  the  al- 
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gorithm  of  Figure  2  as  the  binary  algorithm.  Let  p;  be  the  source,  and  let 
Vi  €  V  be  the  initial  value  of  j>,\  Initially,  pi  begins  the  binary  algorithm 
as  though  it  has  initial  value  0,  and  the  other  processes  begin  with  value 
1.  During  phase  0,  pi  broadcasts  the  message  (v;,  (1,  t));  i.e.,  it  sends  the 
message  (l,i)  that  the  binary  algorithm  would  send,  with  the  value  w,-  pig¬ 
gybacked.  After  this  broadcast,  p;  decides  v,\  Any  process  that  receives 
this  message  during  phase  1  remembers  v,-,  broadcasts  (v,-,  (1,  i)),  and  oth¬ 
erwise  acts  in  the  binary  algorithm  as  though  the  message  (l,t)  had  been 
received.  The  binary  algorithm  is  then  run  to  completion.  If  a  process 
decides  0  (resp.,  1)  in  the  binary  algorithm,  it  decides  v,-  (resp.,  _L)  in  the 
single-source  algorithm.  (The  analysis  below  shows  that  if  pj  decides  0  in 
the  binary  algorithm,  then  pj  receives  v;  during  phase  1.) 

To  argue  correctness,  first  consider  the  case  that  the  source  p;  is  non- 
faulty.  It  is  easy  to  see  in  this  case  that  all  nonfaulty  processes  (except  the 
source)  decide  0  at  phase  2  in  the  binary  algorithm,  so  all  decide  vi  in  the 
single-source  algorithm.  If  p,-  is  faulty,  let  R  be  the  set  of  processes  that 
receive  (t>,-,(l,i))  during  phase  1.  Any  process  not  in  R  either  fails  or  per¬ 
forms  a  decision  transition  at  phase  1.  If  any  such  process  decides,  then  all 
nonfaulty  processes  decide  1.  If  all  processes  that  are  not  in  R  fail  before 
deciding,  then  any  process  pj  that  does  decide  is  in  R,  so  pj  receives  w,- 
during  phase  1. 


6  The  Lower  Bound 


In  this  section  we  prove  our  lower  bound  of  (/  -  1  )d  -f  Cd  on  the  time  to 
reach  agreement  in  the  timing-based  model.  The  proof  requires  four  steps 
and  employs  techniques  used  elsewhere  in  proving  lower  bounds  and  im¬ 
possibility  results  in  the  rounds  model,  the  completely  asynchronous  model, 
and  the  timing-based  model.  The  first  step  is  an  adaptation  of  the  proof 
showing  that  /  +  1  rounds  are  necessary  for  Byzantine  agreement  in  the 


rounds  model  [FL82,  DLM82,  DS83,  LF82,  H84,  M85,  CD86,  DM86].  As 
we  shall  see,  this  adaptation  yields  the  existence  of  two  “long”  (i.e.,  taking 
time  at  least  (/-  l)d)  timed  execution  prefixes,  «o  and  «i,  each  having  only 
/  -  1  faults,  distinguishable  only  to  one  process,  and  each  extendible  to  a 
timed  execution  with  a  different  decision  value.  The  second  step  mimics  a 
key  lemma  in  the  proof  that  agreement  is  impossible  in  asynchronous  sys¬ 
tems  [FLP85,  DDS87].  In  this  step  it  is  shown  that  at  least  one  of  o0  and 
<xi  is  “bivalent,”  in  that  it  has  two  possible  extensions  with  no  additional 
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failures,  each  yielding  a  different  decision  value,  and  in  each  of  which  pro¬ 
cesses  take  steps  as  quickly  as  possible.  In  showing  bivalence,  we  also  use 
an  “execution  retiming”  technique  of  [AL89].  The  third  step  extends  the 
bivalent  timed  execution  prefix  to  a  “maximal”  bivalent  prefix,  having  at 
most  /  - 1  faults.  The  fourth  and  last  step  exploits  the  one  remaining  fault, 
via  another  retiming  argument,  to  show  that  after  this  maximal  bivalent 
timed  execution  prefix  at  least  one  “long  timeout”  (taking  time  at  least  C d ) 
is  necessary. 

We  assume  throughout  this  section  that  c\  <  d,  6  =  d,  and  /  >  1. 

6.1  Synchronous  Timed  Executions 

Our  lower  bound  arguments  for  algorithms  in  the  timing-based  model  will 
be  based  on  a  subset  of  the  timed  executions  which  we  call  “synchronous.” 
We  define  these  in  this  subsection. 

We  think  of  a  synchronous  timed  execution  as  a  sequence  of  “blocks”; 
each  block  is  composed  of  a  sequence  of  message  deliveries  followed  by  a 
sequence  of  process  steps;  all  the  process  steps  in  one  block  occur  at  the  same 
time,  and  each  block  contains  exactly  one  (computation  or  failure)  step  by 
each  process.  More  precisely,  we  say  that  a  timed  execution  is  synchronous 
provided  that  there  is  a  monotone  increasing  sequence  of  times, 
such  that  to  =  0  and  the  following  conditions  are  satisfied. 

1.  Exactly  one  input  event  o  .curs  at  each  process,  and  it  occurs  at  time 

0. 

2.  Each  computation  and  failure  event  occurs  at  time  t,-,  for  some  i.  At 
each  time  t,-,  there  is  exactly  one  computation  or  failure  event  for  each 
process,  and  these  events  occur  in  order  of  process  indices. 

3.  All  input  events  precede  all  computation  and  failure  events  that  occur 
at  time  0. 

4.  All  message  delivery  events  that  occur  at  a  time  ft-  precede  all  compu¬ 
tation  and  failure  events  that  occur  at  the  same  time. 

A  block  in  a  synchronous  timed  execution  can  then  be  identified  with 
the  portion  of  the  execution  occurring  at  times  in  the  interval  (<,*,<,■ +i]  for 
any  particular  i.  A  (finite)  timed  execution  prefix  is  said  to  be  synchronous 
provided  that  it  is  a  prefix  of  a  synchronous  timed  execution  and  it  ends 
with  a  computation  or  failure  step  of  process  pn. 
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Now  suppose  that  a  is  a  synchronous  timed  execution  prefix.  If  7  =  afd 
is  a  synchronous  timed  execution  or  a  synchronous  timed  execution  prefix, 
we  say  that  7  is  a  failure-free  extension  (or  simply  ff-extension )  of  a  if  no 
failures  occur  in  j3.  We  say  that  7  is  a  fast  extension  of  a  if  the  times  for 
computation  and  failure  steps  in  7  that  are  greater  than  tend(a)  are  exactly 
all  the  times  that  axe  of  the  form  tenj(a)  plus  a  positive  multiple  of  cj. 
Similarly,  7  is  a  slow  extension  of  a  if  the  computation  and  failure  step 
times  are  all  those  of  the  form  ttni(°)  plus  a  positive  multiple  of  C2. 

6.2  Existence  of  Long  Prefixes 

For  the  first  step,  we  show  the  existence  of  the  two  long  timed  execution 
prefixes  mentioned  above.  Since  we  do  this  by  adapting  a  proof  from  the 
rounds  model,  it  is  useful  for  us  to  restrict  attention  to  a  subclass  of  the 
synchronous  timed  executions  that  look  more  like  executions  of  the  rounds 
model.  In  particular,  we  will  consider  timed  executions  in  which  messages 
are  delivered  in  batches  at  times  that  are  positive  multiples  of  d.  Also, 
although  step  time  is  irrelevant  here,  we  say  (to  be  specific)  that  processes 
take  steps  at  every  multiple  of  cj,  starting  with  0.  Formally,  we  define  the 
uniform  timed  executions  to  be  those  synchronous  timed  executions  in  which 

1.  for  every  integer  r  >  1,  any  message  that  is  sent  at  time  t,  with  (r  — 
l)d  <t  <rd ,  is  delivered  at  time  rd,  and 

2.  each  step  time  U  is  equal  to  ic  1 . 

Also,  the  uniform  timed  execution  prefixes  are  defined  to  be  the  timed  exe¬ 
cution  prefixes  that  are  prefixes  of  uniform  timed  executions  and  end  with 
a  computation  or  failure  event  of  pn. 

Uniform  timed  executions  are  similar  to  executions  in  the  rounds  model. 
For  example,  if  c\  =  d,  then  there  is  a  direct  correspondence  between  the 
two.  In  general  uniform  executions,  however,  a  process  may  take  several 
steps  (and  send  at  several  different  times)  within  each  round  of  message 
exchange. 

The  basic  lower  bound  result  for  agreement  in  the  rounds  model  asserts 
that,  for  /  <n  —  2,  agreement  in  the  presence  of  stopping  failures  requires 
/  +  1  rounds  [LF82,  H84,  M85,  CD86,  DM86].  The  proof  of  this  result 
contains  a  key  lemma  that  shows,  loosely  speaking,  that  for  any  agreement 
algorithm  all  execution  prefixes  with  at  most  /  rounds  in  which  at  most  one 
process  fails  in  each  round  are  similar.  Two  execution  prefixes  are  directly 
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similar  if  some  nonfaulty  process  cannot  “distinguish  between”  them.  The 
similarity  relation  is  the  transitive  closure  of  the  direct  similarity  relation. 

By  redefining  “directly  similar”  so  that  two  execution  prefixes  are  di¬ 
rectly  similar  if  at  most  one  process  can  distinguish  between  them,  and 
redefining  “similar”  accordingly,  it  is  easy  to  modify  this  standard  proof 
to  apply  to  our  uniform  timed  executions  and  to  yield  a  slightly  stronger 
conclusion.  In  this  way,  we  obtain  the  following  lemma.6 

We  define  two  timed  execution  prefixes,  on  and  aq,  with  ttnd  =  t en<K<*o)  = 
Und(<Xi),  to  be  indistinguishable  to  process  p;  provided  that  (a)  the  sequence 
of  timed  events  occurring  at  p;  and  the  sequence  of  intervening  local  states 
of  pi  are  identical  in  op  and  aq,  with  the  exception  that  corresponding 
fail  events  of  p;  in  the  two  event  sequences  can  send  different  sets  of  mes¬ 
sages,  and  (b)  the  messages  which  are  sent  to  p,-  strictly  before  time  ttnd, 
together  with  their  senders  and  sending  times,  are  identical  in  ao  and  oq. 
The  sequences  ao  and  aq  are  said  to  be  distinguishable  to  p,-  if  they  are  not 
indistinguishable  to  p,\ 

Lemma  6.1  Let  A  be  an  n-process  algorithm  in  the  timing-based  model  that 
solves  the  agreement  problem  for  f  <  n  —  1  faults.  Let  k  be  a  nonnegative 
integer,  k  <  f  —  1.  Then  there  are  two  (uniform)  timed  execution  prefixes, 
c*o  and  aq,  satisfying  the  following  conditions: 

1.  trnd(aj )  =  f^]  ci,  forj  =  0,1.7 

2 .  There  is  a  fast  jf-extension  of  aj  in  which  some  process  decides  j}  for 

j  “  0,1. 

5.  If  Fj  is  the  set  of  processes  that  are  faulty  in  a  j,  j  =  0, 1,  then  \Fo  U 
i'll  <  kf  and 

There  is  at  most  one  process  to  which  ao  and  a\  are  distinguishable . 

6 For  those  who  are  familiar  with  the  earlier  proofs:  The  proof  involves  constructing 
a  “chain”  of  timed  execution  prefixes.  Each  pair  of  consecutive  prefixes  either  (a)  have 
identical  sets  of  failed  processes  and  differ  only  in  the  presence  or  absence  of  one  particular 
message  m  sent  by  a  faulty  process  pi  to  a  process  pj;  moreover,  pj  does  not  send  any 
messages  (in  either  prefix)  at  or  after  the  delivery  time  of  m  and  strictly  prior  to  iend)  or 
(b)  differ  only  in  that  one  process  that  sends  all  its  messages  at  some  time  <»  but  none 
thereafter,  in  both  prefixes,  does  a  failure  transition  at  time  ti  in  one  case  and  at  <*+ 1  in 
the  other  case,  or  (c)  differ  only  in  that  one  process  that  sends  all  its  messages  at  time 
ten(l  does  a  failure  transition  at  time  ten(f  in  one  prefix  and  does  not  fail  in  the  other 
prefix,  or  (d)  differ  only  in  the  initial  value  of  one  process  that  fails  at  time  0  and  sends 
no  messages. 

'Note  that  the  time  |  —  |  c\  is  the  least  multiple  of  ci  greater  than  or  equal  to  kd. 
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6.3  Existence  of  a  Long  Bivalent  Prefix 

For  the  second  step,  we  show  that,  under  the  assumption  that  agreement 
can  be  reached  in  time  strictly  less  than  (/  -  l)d  +  Cd,  both  decisions  are 
still  possible  after  at  least  one  of  ao,  <*i-  In  order  to  do  this,  we  need  to 
formalize  the  notion  that  “both  decisions  are  still  possible”  after  a  prefix. 
Let  a  be  a  synchronous  timed  execution  prefix. 

We  say  that  a  value  v  €  {0, 1}  is  fast  failure-free-reachable  (or  just  fast 
ff-reachable )  from  a  if  there  is  a  synchronous  fast  failure-free  extension  7  of 
a  such  that  some  process  decides  v  in  7.  We  say  that  o  is  0-valent  if  only  0 
is  fast  ff-reachable  from  a,  and  1-valent  if  only  1  is  fast  ff-reachable.  We  say 
that  a  is  univalent  if  it  is  either  0-valent  or  1-valent,  and  that  a  is  bivalent 
if  both  0  and  1  are  fast  ff-reachable  from  a.8 

The  next  lemma  is  the  key  for  completing  the  proof  of  the  lower  bound. 
It  shows  that  there  cannot  be  two  “long”  execution  prefixes  (i.e.,  prefixes 
that  end  at  a  “late”  time)  that  have  opposite  valence,  that  do  not  contain 
many  faults,  and  that  are  distinguishable  to  at  most  one  process. 


Lemma  6.2  Let  A  be  an  algorithm  in  the  timing-based  model  that  solves 
the  agreement  problem  for  f  <  »  —  1  faults  within  time  strictly  less  than 
t  +  Cd. 

Then  there  cannot  be  two  synchronous  timed  execution  prefixes,  oto  and 
ai ,  satisfying  the  following  properties: 

I*  lend(aO )  =  tend(tti)  ^  t , 

2.  otj  is  j-valent,  j  =  0,1, 

S.  if  Fj  is  the  set  of  processes  that  are  faulty  in  ctj,  j  =  0,1,  then  \Fo  U 
k\<f-  1;  and 

4.  there  is  at  most  one  process  to  which  a0  and  a\  are  distinguishable. 


Proof:  Suppose,  by  way  of  contradiction,  that  such  prefixes  <*o  and  c*i 
exist.  Let  F  be  the  union  of  Fq,  F\,  and  the  set  (of  size  at  most  1)  of 
processes  to  which  ao  and  a\  are  distinguishable;  note  that  |F’|  <  /.  Let 
a'Q  be  a  synchronous  timed  execution  prefix  that  is  identical  to  ao  except 
that  each  p-x  €  F  does  a  failure  step  in  which  it  sends  no  messages  at  time 
tend  if  it  has  not  failed  previously  in  a0.  Let  70  be  a  slow  ff-extension  of  a'0. 


8The  terminology  is  derived  from  that  of  [FL?85j,  although  the  definitions  are  not 
exactly  equivalent. 
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Let  7i  be  constructed  in  a  similar  way  from  a* ,  subject  to  the  additional 
condition  that  the  portion  of  71  after  time  tend  is  identical  10  the  portion  of 
70  after  time  ten<f.  This  is  possible  since  Oq  and  a[  are  indistinguishable  to 
all  processes  other  than  those  in  F,  and  moreover  all  messages  in  transit  to 
these  processes  at  time  tend  are  the  same  in  Oq  and  a^. 

Since  |F|  <  /,  it  follows  that  each  of  70  and  71  is  /-admissible.  Since 
teni 1  >  t  and  the  algorithm  decides  before  time  t  +  Cd,  all  the  nonfaulty 
processes,  i.e.,  those  processes  not  in  F,  decide  in  each  of  70  and  71  strictly 
before  time  ttnd  +  Cd.  Since  70  and  71  are  indistinguishable  to  all  processes 
other  than  those  in  F,  they  have  the  same  decision  value  v.  Fix  j  =  1  —  v. 
(This  makes  sense  because  v  €  {0,1}.) 

Let  7 j  be  a  retiming  of  7 j  that  keeps  the  times  of  all  events  up  to  and 
including  ten(j  the  same,  and  that  causes  every  event  that  occurs  at  time 
tend  +  u  in  7 j,  for  u  >  0,  to  occur  at  time  tend  +  u/C  in  7}.  Then  all 
processes  not  in  F  decide  v  in  7},  strictly  before  time  tend  +  d. 

Now  let  7"  be  a  fast  ff-extension  of  ctj  in  which  any  messages  sent  by 
processes  in  F  at  times  greater  than  or  equal  to  ttnd  take  time  exactly  d  to 
be  delivered,  and  such  that  7"  looks  exactly  like  7}  to  all  processes  except 
those  in  F  at  times  before  tend  +  d.  Since  the  processes  not  in  F  cannot  tell 
the  difference  between  7 '•  and  7}  strictly  before  time  tend  +  d,  all  processes 
not  in  F  must  decide  v  in  7". 

But  since  7"  is  a  fast  ff-extension  of  aj  and  cty  is  /-valent,  the  processes 
that  are  nonfaulty  in  7"  must  decide  j  in  7".  Since  the  processes  not  in  F 
are  nonfaulty  in  7 this  is  a  contradiction.  ■ 

Corollary  6.3  Let  A  be  an  algorithm  in  the  timing-based  model  that  solves 
the  agreement  problem  for  f  <  n  —  1  faults  within  time  strictly  less  than 
(/  —  l)d  +  Cd.  Then  there  is  an  (/  —  l)-admissible  synchronous  timed 
execution  prefix  a  such  that  the  following  conditions  hold: 

F  tend{oc)  =  ci,  and 

2.  a  is  bivalent. 

Proof:  Let  oo  and  07  be  obtained  by  setting  k  =  /  ~  1  in  Lemma  6.1. 
We  show  that  at  least  one  of  00  and  ai  has  the  required  properties.  All 
properties  except  the  bivalence  are  immediate,  so  we  must  show  that  at 
least  one  of  ao  and  07  is  bivalent.  We  proceed  by  contradiction.  Assume 
that  neither  of  ao  and  ai  is  bivalent.  Then  for  j  =  0, 1,  since  a  decision  of 
j  is  possible  in  a  fast  ff-extension  of  aj  (by  Lemma  6.1),  it  must  be  that 
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aj  is  y- valent.  But  then  a o  and  «i  satisfy  all  the  conditions  described  in 
the  statement  of  Lemma  6.2,  where  t  =  (/  -  l)<f.  Lemma  6.2  then  yields  a 
contradiction.  ■ 


6.4  Existence  of  a  Long  Maximal  Bivalent  Prefix 

For  the  third  step,  we  construct  a  “maximal”  finite  bivalent  extension  a '  of 
the  bivalent  timed  execution  prefix  obtained  in  the  previous  lemma.  Roughly 
speaking,  the  end  of  a'  is  a  branch  point,  from  which  both  decisions  are  still 
fast  ff- reachable  and  such  that  at  the  next  step  time  in  any  fast  ff-extension 
of  a'  the  decision  must  be  determined. 

Lemma  6.4  Let  A  be  an  algorithm  in  the  timing-based  model  that  solves 
the  agreement  problem  for  f  <  n  —  1  faults  within  time  strictly  less  than 
(f—l)d+Cd.  Then  A  has  an  (f— Inadmissible  synchronous  timed  execution 
prefix  a 1  such  that 

1.  >  (/  -  1  )d  and 

2.  a!  is  bivalent, 

and  such  that  there  are  two  fast  ff-extensions  of  a1 ,  pj,  j  =  0,1,  satisfying 
the  following  properties: 

1.  P j  is  an  extension  of  a'  by  exactly  one  block,  j  =  0,1, 

2.  (3j  is  j-valent,  j  =  0, 1,  and 

3.  Po  and  Pi  are  indistinguishable  to  all  but  at  most  one  process. 

Proofs  By  Lemma  6.3,  A  has  a  (/  -  Inadmissible  synchronous  timed 
execution  prefix  a  satisfying  the  following  properties: 

1*  tend(a)  =  pip]  cu  and 
2.  a  is  bivalent. 

Let  T  be  the  set  of  finite  bivalent  fast  ff-extensions  of  a.  Each  such 
extension  must  have  its  final  time  strictly  less  than  (/  —  1  )d  +  Cd,  since  A 
is  assumed  to  decide  within  that  time.  Since  each  block  takes  time  ci,  there 
must  exist  a  maximal  element  of  T,  i.e.,  one  that  has  no  proper  extensions 
in  E;  let  a'  be  such  a  maximal  element. 
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Let  0  be  the  set  of  all  finite  fast  ff-extensions  of  a1  consisting  of  a' 
followed  by  a  single  block.  In  other  words,  every  /?  G  0  consists  of  a'  followed 
by  a  sequence  of  message  deliveries  and  a  single  step  by  each  process.  Since 
fast  ff-extensions  are  synchronous,  ttn i(fi)  —  teni(a')  +  ci  for  each  fi  G  0. 
By  maximally  of  a',  every  timed  execution  prefix  in  0  is  univalent.  Since  a' 
is  bivalent,  there  must  be  at  least  one  such  extension  that  is  0- valent  and  at 
least  one  that  is  1-valent.  (This  uses  the  fact  that  bivalence  is  by  definition 
with  respect  to  fast  ff-extensions.)  Let  fij  e  0  be  j-valent,  for  j  =  0, 1. 

Now  we  construct  a  sequence,  fi",  0  <  i  <  n,  of  elements  of  0  such  that 
fig  =  fio,  fin  =  fi{,  and  for  all  i,  1  <  i  <  n,  fi"^  and  fi"  are  indistinguishable 
to  all  processes  other  than  p,\  The  construction  is  inductive.  First  define 
fig  =  fi'0.  Then  for  each  t,l  <  i  <  n,  define  fi"  G  0  to  be  the  same  as  fi\'_i 
except  that  the  message  deliveries  to  p;  in  fi"  are  as  in  fi[.  (Since  all  the 
messages  delivered  to  p,-  in  fi[  are  sent  by  time  teni{o!),  such  a  fi"  exists.) 

Since  each  fi"  G  0,  it  is  univalent.  Since  fig  is  G-valent  and  fi”  is  1-valent, 
there  must  exist  i,  1  <  i  <  n,  such  that  fi"_1  is  0- valent  and  fi"  is  1-valent. 
Then  defining  fio  =  fi"_x  and  fii  =  fi"  suffices  to  prove  the  lemma.  ■ 

6.5  The  Final  Step 

For  the  final  step  of  our  proof,  we  now  use  Lemma  6.2  once  again  to  yield 
our  main  lower  bound  theorem. 

Theorem  6.5  Assume  1  <  /  <  n—  1.  There  is  no  algorithm  in  the  timing- 
based  model  that  solves  the  agreement  problem  for  f  faults  within  time  strictly 
less  than  (/  —  l)d  +  Cd.  Moreover,  this  lower  bound  holds  in  the  case  of 
synchronized  start. 

Proof:  Suppose,  by  way  of  contradiction,  that  such  an  algorithm  A  exists. 
Then  Lemma  6.4  yields  an  (/  —  Inadmissible  synchronous  timed  execution 
prefix  a1  such  that  ttnd( &1)  >  (/  —  l)d  and  a'  is  bivalent,  and  such  that 
there  are  two  feist  ff-extensions  of  a1,  fij ,  j  s  0,1  satisfying  the  following 
properties: 

1.  fij  is  an  extension  of  of  by  exactly  one  block,  j  =  1,1, 

2.  fij  is  j- valent,  j  =  0,1,  and 

3.  fio  and  fii  are  distinguishable  to  all  but  at  most  one  process. 

But  then  fio  and  fii  satisfy  all  the  conditions  in  Lemma  6.2,  with  t  =  (f-l)d. 
This  immediately  yields  a  contradiction.  ■ 
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Remark  4  The  lower  bound  obtained  in  this  proof  is  not  always  the  best 
possible.  If  d  =  kc 2  +  £•  for  some  integer  k  then  we  can  actually  obtain  a 
bound  of  (/-  l)(d+  «2  —  e)+Cd.  Since  in  theory  e  can  be  arbitrarily  small, 
we  get  essentially  (/  —  1  )D  +  Cd  in  the  worst  case. 

7  Implications  for  Synchronous  Processes  with 
Message  Delivery  Uncertainty 

In  the  Introduction,  we  indicated  that  our  results  could  be  applied  to  the 
model  used  in  [HK89],  in  which  process  steps  are  completely  synchronous, 
that  is,  ci  =  C2,  so  C  =  1,  and  in  which  5,  the  actual  message  delivery  bound 
in  a  particular  execution,  can  be  much  smaller  than  the  worst-case  message 
delivery  time  d.  In  this  subsection,  we  say  more  about  these  applications. 

First,  we  consider  the  cost  of  implementing  the  timeout  task  in  the  C  =  1 
model.  The  timeout  strategy  of  Section  3  yields  a  timeout  bound  T  of  at 
most  d  +  5  +  3ci.  However,  since  processes  are  synchronous,  the  timeout 
bound  can  be  improved  slightly,  using  a  different  strategy.  Process  pj  broad¬ 
casts  the  message  (alive,  j,k)  at  its  K-th  step  for  all  k.  If  process  pi  has  not 
received  the  message  (alive,  j,k)  by  its  (k+  [d/cij  +  l)-th  step,  then  p;  adds 
Pj  to  its  set  of  halted  processes.  This  strategy  gives  a  timeout  bound  of 
T  =  d  +  2ci. 

We  consider  the  simple  upper  and  lower  bounds  for  agreement.  The 
simple  upper  bound  of  approximately  (/  +  l)Cd  of  Section  4  specializes  to 
yield  an  upper  bound  of  approximately  (/+ l)d,  even  for  executions  in  which 
5  <  d.  On  the  other  hand,  a  simple  lower  bound,  obtained  by  adapting  the 
(/  +  1)  round  lower  bound  for  the  rounds  model,  is  (/  -}- 1)5.  This  leaves  a 
gap  of  a  multiplicative  factor  of  d/6. 

The  main  algorithm  of  this  paper  helps  to  close  this  gap.  Since  we  carried 
out  the  analysis  of  our  main  algorithm  in  terms  of  6 [  and  T,  it  is  easy  to 
translate  the  result  to  the  C  =  1  model.  Using  the  improved  timeout  bound 
above,  we  conclude  that  the  algorithm  runs  in  time 

(2/  -  1)A  +  max{d,35}  +  3ci, 

or  approximately  (2/  -  1)5  -f  max{</,35}  if  ci  <  5.  Therefore,  the  number 
of  faults  multiplies  the  actual  message  delay  5  rather  than  the  worst-case 
delay  d. 

We  note  that  the  methods  of  [DLS88]  give  a  completely  different  agree¬ 
ment  algorithm  in  the  C  —  1  model  with  time  complexity  0(n6 ),  provided 
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that  n  >  2/  +  1.  (The  methods  of  [DLS88]  do  not  work  when  n  <  2/.) 

We  now  consider  lower  bounds  in  the  (7  =  1  model.  The  lower  bound 
techniques  of  this  paper  can  be  modified  to  give  a  lower  bound  of  time 
(2/  -  n)S  +  d  provided  that  /  +  1  <  n  <  2 /.  More  specifically,  in  the  case 
where  n  <  2/,  a  “partitioning”  argument,  similar  to  ones  used  in  [BT85]  and 
[DLS88],  easily  gives  a  lower  bound  of  d ,  even  in  certain  executions  in  which 
the  actual  message  delay  6  is  ci,  so  messages  are  being  delivered  essentially  as 
fast  as  possible.  By  combining  the  partitioning  argument  with  the  argument 
used  to  prove  the  (/  +  1)  round  lower  bound  (see  the  discussion  preceding 
Lemma  6.1),  a  lower  bound  of  (2/  -  n)S  +  d  can  be  shown  if/+l<n<2/. 
This  bound  can  be  compared  to  the  upper  bound  of  roughly  (2/  -  1)5  +  d 
described  above.  In  the  case  n  >  2/,  the  upper  bound  0(nS )  shows  that 
the  time  need  not  depend  on  d  at  all. 

8  Conclusions  and  Open  Questions 

Although  there  is  a  gap  between  our  lower  bound  of  (/  -  l)cf  +  Cd  and 
our  upper  bound  of  approximately  2/d  +  Cd,  we  feel  we  have  substantially 
answered  the  question  of  how  the  time  requirement  depends  on  the  timing 
uncertainty,  as  measured  by  C  =  c^/ci.  In  particular,  we  have  shown  that 
only  a  single  “long  timeout”  (i.e.,  a  timeout  requiring  time  Cd)  is  required, 
and  this  long  timeout  cannot  be  avoided.  Similarly,  for  the  case  in  which 
C  =  1,  we  have  shown  that  the  time  depends  on  the  worst-case  message 
delivery  time  d  only  once. 

An  obvious  open  problem  is  to  close  the  gap  between  the  lower  and  upper 
bounds.  Another  question  is  whether  these  results  can  be  extended  to  other 
types  of  failures  such  as  Byzantine  or  omission  failures.  Some  results  on  tliis 
last  question  have  already  been  obtained  by  Ponzio  [P90]. 

A  more  general  direction  for  future  research  is  to  try  to  extend  the  tech¬ 
niques  described  in  this  paper  to  permit  simulation  of  arbitrary  round-based 
fault-tolerant  algorithms  in  the  model  with  timing  uncertainty.  The  hope  is 
that  such  a  simulation  will  not  incur  the  multiplicative  overhead  of  T  of  the 
simple  transformation  described  in  Section  4. 

Our  algorithms  assume  that  each  message  is  delivered  within  at  most 
time  d  under  all  circumstances,  in  particular,  even  if  the  message  delivery 
system  is  overloaded  with  messages.  A  more  reasonable  assumption  is  that 
all  messages  get  delivered  within  at  most  time  d ,  provided  that  the  number 
of  messages  in  transit  is  bounded.  The  algorithms  we  present  in  this  paper 
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send  only  a  bounded  number  of  messages,  and  so  would  work  under  such  a 
restriction.  Our  lower  bound  does  not  rely  on  this  restriction,  and  carries 
over  a  fortiori  for  the  restricted  case.  Some  preliminary  quantitative  results 
relating  the  time  complexity  of  a  timeout  task  to  the  capacity  of  the  channels 
appear  in  [P90]. 

As  mentioned  earlier,  the  work  presented  in  this  paper  is  part  of  an 
ongoing  effort  to  obtain  a  precise  understanding  of  the  role  played  by  time, 
and  timing  uncertainty  in  particular,  in  distributed  systems.  The  upper 
bound  presented  in  this  paper  is  based  on  an  approach  that  departs  from 
known  algorithms  for  agreement  in  the  synchronous  model.  We  believe 
that  there  are  many  other  fundamental  tasks  in  distributed  systems  whose 
study  might  lead  to  the  discovery  of  new  approaches  for  coping  with  timing 
uncertainties. 
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